10

Tracing hardware offload in Open vSwitch

 2 years ago
source link: https://developers.redhat.com/articles/2021/12/10/tracing-hardware-offload-open-vswitch
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Tracing hardware offload in Open vSwitch Skip to main content

Open vSwitch (OVS) is an open source framework for software-defined networking (SDN) and is useful in virtualized environments. Just like conventional network stacks, OVS can offload tasks to the hardware running on the network interface card (NIC) to speed up the processing of network packets.

However, dozens of functions are invoked in a chain to achieve hardware offload. This article takes you through the chain of functions to help you debug networking problems with OVS.

This article assumes that you understand the basics of OVS and hardware offload. To accompany your study of this article, you should be familiar with network commands, particularly Linux's tc (traffic control) command, in order to dump traffic flows and see whether they are offloaded.

For the flow illustrated in this article, I used a Mellanox NIC.

The start of a network transmission

The start of a network transmission

Let's start our long journey from an add/modify OVS operation to the hardware drivers with the first few functions called. Figure 1 shows each function at the beginning of the process, and the file in OVS that defines the function.

Figure 1. OVS launches the chain of functions that eventually lead to hardware offload.

The dpif_netdev_operate function is registered to the function pointer dpif->dpif_class->operate. Calling the function leads to the call stack in Figure 2.

Figure 2. dpif_netdev_operate continues the sequence that adds an operation to a queue.
OVS offload operations

OVS offload operations

OVS creates a thread named dp_netdev_flow_offload_main as follows:

ovs_thread_create("dp_netdev_flow_offload", dp_netdev_flow_offload_main, NULL);

Offload add and modify operations call the dp_netdev_flow_offload_put function, whereas delete operations call the dp_netdev_flow_offload_del function. Figure 3 illustrates what happens after a call to dp_netdev_flow_offload_put.

Figure 3. A put operation invokes the function assigned to a function pointer for that purpose.

OVS's /lib/netdev-offload.c file defines a netdev_register_flow_api_provider function. The chain of calls continues through a function pointer registered as follows:

netdev_register_flow_api_provider(&netdev_offload_tc);

The netdev_tc_flow_put function is assigned to the .flow_put struct member as shown in the following excerpt:

const struct netdev_flow_api netdev_offload_tc = {
.type = "linux_tc",
 …
 .flow_put = netdev_tc_flow_put,
 …
};

After the call reaches netdev_tc_flow_put, the chain of calls continues as shown in Figure 4.

Figure 4. Another sequence eventually calls sendmsg.
Sequence from a tc command

Sequence from a tc command

Let's leave our pursuit of offloading in the chain of OVS calls for a moment and look at a more conventional sequence of calls. Without OVS in the picture, a call from the tc utility proceeds as shown in Figure 5.

Figure 5. You can run a tc command and trace the sequence from sendmsg.

Whether sendmsg is issued from tc, from OVS, or from another sender, the message goes to the kernel and then to the hardware driver.

The call to sendmsg

The call to sendmsg

Now let's continue from where we had paused earlier in sendmsg. The chain of functions continues as shown in Figure 6.

Figure 6. sendmsg invokes functions from the Routing Netlink (rtnl) subsystem.

The Linux kernel registers the following functions to Routing Netlink (rtnl) subsystem:

  • tc_new_tfilter
  • tc_del_tfilter
  • tc_get_tfilter
  • tc_ctl_chain

These functions are registered by calling rtnl_register in the net/sched/cls_api.c file. The RTM_NEWCHAIN, RTM_GETCHAIN, and RTM_DELCHAIN operations take place in tc_ctl_chain. In turn, rtnl_register invokes rtnl_register_internal, defined in net/core/rtnetlink.c.

The sequence continues based on functions registered to the rtnl subsystem. tc_new_tfilter, defined in net/sched/cls_api.c, invokes the function pointer registered to tp->ops->change, and ends up calling fl_change from the net/sched/cls_flower.c file.

fl_change checks whether the skip_hw or skip_sw policy is present. If the tc-policy is skip_hw, the flow is just added to tc and the function returns.

Figure 7 takes a deeper look into the fl_change function. It has changed somewhat in the latest kernel version, but the control flow is pretty much the same as the one shown in the figure.

Figure 7. The fl_change function checks for skip_sw and skip_hw.

If tc-policy is unset or skip_sw, the call sequence tries to add the flow to the hardware. Because we are interested in flows that get offloaded to hardware, we continue our journey further. The sequence of calls is the following, invoking functions 

fl_hw_replace_filter (cls_flower.c) --> tc_setup_cb_add (cls_api.c) --> __tc_setup_cb_call (cls_api.c)

Finally, in the device driver

Finally, in the device driver

From here, the sequence goes to the hardware driver that was registered for the sender when Linux set up traffic control as part of its init sequence. For instance, the following code defines our Mellanox driver as the recipient of the message:

.ndo_setup_tc            = mlx5e_setup_tc,

The mlx5e_setup_tc function issues the following call to register the socket buffer's control block (CB):

flow_block_cb_setup_simple(type_data, &mlx5e_block_cb_list, mlx5e_setup_tc_block_cb, priv, priv, true);

In our case, the Mellanox hardware function named mlx5e_setup_tc_block_cb gets called.

So now we have reached the Mellanox driver code. A few more calls and we can see how the flow rule is added to the flow table for hardware offload (Figure 8).

Figure 8. The Mellanox hardware driver checks flags to add the flow.

The drivers/net/ethernet/mellanox/mlx5/core/fs_cmd.c registers the following function, and the sequence continues as shown in Figure 9.

Figure 9. The Mellanox driver adds the flow to the hardware.
.create_fte = mlx5_cmd_create_fte,

The final function in Figure 9 invokes a command that adds the flow rule to the hardware. With this result, we have reached our destination.

Conclusion

Conclusion

I hope this helps you understand what happens while adding a flow for hardware offload, and helps you troubleshoot problems you might encounter. To learn more about the basics of Open vSwitch hardware offload, I recommend reading Haresh Khandelwal's blog post on the subject.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK