3

Setting up SMB Multi-Channel between FreeNAS (or any BSD/Linux) and Windows for...

 3 years ago
source link: https://codeinsecurity.wordpress.com/2020/05/18/setting-up-smb-multi-channel-between-freenas-or-any-bsd-linux-and-windows-for-20gbps-transfers/
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.

Setting up SMB Multi-Channel between FreeNAS (or any BSD/Linux) and Windows for 20Gbps transfers

SMB Multi-Channel is a useful performance feature that distributes SMB traffic over multiple network connections, allowing it to scale across multiple network adapters, as well as multiple CPU cores through the use of receive-side scaling (RSS). It is supported and enabled in Windows 10 by default, and Samba has support for it as of version 4.4. At the time of writing, FreeNAS 11 is running smbd version 4.10.2, which of course means it supports multi-channel.

Multi-channel works by making multiple TCP connections per transfer and splitting the traffic across them. In the simplest configuration, where you’ve got a server with two network interfaces on different subnets, multi-channel makes a connection (or multiple connections) to each of the server’s IP addresses, allowing that traffic to be split across the interfaces. If both the client NIC and server NIC advertise RSS support, multi-channel will also create multiple connections to each IP, which allows each TCP connection to be pinned to a hardware RSS queue, which themselves are handled by separate CPU cores, allowing for better load distribution. Multi-channel can utilise RSS even if the target server only has one NIC or IP address. It is also possible to use multi-channel over link aggregations, where multiple NICs are bonded into one group or “team” (often referred to as a link aggregation group, or LAG).

As I already had LAGs configured across the two 10Gbps ports of the Intel X520-DA2 NICs in my workstation and NAS, it seemed easiest to just enable multi-channel over that and have the traffic distributed automatically over the ports. Specifically, I’m using Intel’s PROSet utility on Windows to make a NIC team on the workstation, and FreeBSD’s native link aggregation support on the NAS, with an LACP-supporting L2/L3 switch in between the two systems.

A diagram showing the network configuration.

The first thing I did was add the following directive to the auxiliary parameters section of the SMB config in FreeNAS:

server multi channel support = yes

However, upon inspecting the /usr/local/etc/smb4.conf file I discovered that FreeNAS sets some defaults that are not ideal for this configuration. It disables asynchronous IO, and sets the maximum IO threads to 2 – presumably one for reading, one for writing. To remedy this, I also added the following directives to the auxiliary parameters:

aio max threads = 100
aio read size = 1
aio write size = 1

(don’t change smb4.conf directly; anything you put there will be blown away on reboot)

This re-enables asynchronous IO on all SMB reads and writes, because the read and write size values mean “perform async IO on any packet bigger than this many bytes”, and restores the default max async IO thread count of 100.

This, however, doesn’t work on its own. If you take a look in /usr/local/etc/smb4_share.conf, which is used to specify the individual shares, and is included by smb4.conf, you’ll see that the aio size options are also explicitly set here. This means that for each SMB share defined in the Sharing section of the FreeNAS UI, you’ll need to go in, click Advanced Mode, then add the following auxiliary parameters:

aio read size = 1
aio write size = 1

Restarting the SMB service gets all of this up and ready to go.

In Windows, you can now open a share, then launch a Powershell window as admin and run Get-SmbMultichannelConnection to see it up and working:

PS C:\WINDOWS\system32> Get-SmbMultichannelConnection

Server Name              Selected Client IP     Server IP     Client Interface Index Server Interface Index Client RSS Capable Client RDMA Capable
-----------              -------- ---------     ---------     ---------------------- ---------------------- ------------------ -------------------
goliath.polynomial.local True     10.1.0.2      10.1.0.1      15                     6                      False              False

However, when doing a file transfer I discovered that I wasn’t seeing any difference in transfer speeds, despite performing a large sequential read from a file I knew was in ARC to a RAM disk on my desktop. I could see that I was limited to the speed of only one port on the NIC, not the full LAG speed.

After some investigation I discovered a couple of reasons for this. The first was related to a particular bit of configuration on my switch, which I apparently missed when I recently had to do a factory reset on it after locking myself out (whoops).

Traffic sent over a LAG generally has to be carefully distributed. If you just randomly distribute packets across the interfaces it can lead to a lot of out-of-order receives, which means poor performance, high overhead on the receiving side, and a lot of unnecessary retransmits. To get around this, information from each packet is hashed into an index number that selects which interface in the LAG is to be used. Which parts of the packet are hashed differs between implementations, but the receiving side doesn’t need to know about the hashing scheme because it’s only used for distributing outgoing packets across the links, not re-assembling the result.

The default scheme in FreeNAS is to use L2, L3, and L4 information: the source and destination MAC address, the source and destination IP address, the protocol, and the source and destination ports for TCP or UDP. This ensures that each discrete TCP connection (or stream of UDP packets) will always be kept to a single interface, preventing out-of-order headaches. Intel’s PROSet implementation on Windows probably does a similar thing, although I haven’t been able to find anything concrete. The inbuilt NIC teaming feature in Windows defaults to Address Hash mode, which uses source and destination IPs and source and destination TCP or UDP ports, but unfortunately Microsoft has not made NIC teaming available on desktop SKUs[1].

You’d think, based on this information, that each connection made by SMB multi-channel would have a pretty high likelihood of being routed via a different interface in the LAG, because source ports on TCP connections are randomised and therefore the hash should change. This is true, but the problem occurred once the packets hit the switch. It’s easy to think of the network having a LAG from my workstation to the NAS via a switch, but more accurately there’s a LAG from my workstation to the switch, and another from the NAS to the switch. This makes sense – you need the switch to be able to talk to the LAGs properly so that it can route packets to and from them. The problem is that my switch only supports hashing using L2 or L3 information (configurable as source and/or destination MAC, or source and/or destination IP), so the TCP/UDP ports aren’t taken into consideration. This means that even though each connection is being distributed across interfaces as it leaves the workstation or NAS, the switch only distributes the packets based on source and destination IP address, which in this case was always the same.

The second reason this wasn’t working properly is that the SMB implementation doesn’t know actually anything about my LAG, so it just sees a single interface and makes a single connection to it, even with multi-channel enabled. This is because I was using Intel PROSet to set up the LAG, which is just exposed to Windows as a regular network interface. If I were able to use the inbuilt NIC teaming features in Windows (not available on desktop SKUs, as I mentioned before) then Windows would know about the LAG and multi-channel would have automatically scaled the number of connections based on the number of interfaces in the group[2]. Resolving this wouldn’t help in my situation because of the aforementioned issues with the switch’s lack of support for L4 hashing in LAGs, but it would’ve resulted in no scaling at all even if the switch didn’t have that limitation because multi-channel would only see one interface and only attempt to open one transfer channel.

You might also notice that the “Client RSS Capable” column in the Powershell output above shows as False. At this point I hadn’t configured RSS on the FreeBSD side (it’s automatically detected on Windows) so it wasn’t in use yet. This is yet another reason that multiple connections weren’t being made, although again the switch issue still would’ve prevented speeds greater than 10G even with RSS enabled.

The easy way to set this up is just to remove the LAGs and have one interface use whatever regular IP and subnet you’d use for the system, then give the interface a different IP and subnet, so connections to those IP addresses automatically get routed through the right interface. This is what I did initially. One interface was set to the usual IP in my primary 10.0.0.0/8 network, and the other was set to an IP in the 192.168.100.0/30 subnet.

Testing this again, I can see that Windows has detected both interfaces:

PS C:\WINDOWS\system32> Get-SmbMultichannelConnection

Server Name              Selected Client IP         Server IP         Client Interface Index Server Interface Index Client RSS Capable Client RDMA Capable
-----------              -------- ---------         ---------         ---------------------- ---------------------- ------------------ -------------------
goliath.polynomial.local True     10.1.0.2          10.1.0.1          15                     6                      False              False
goliath.polynomial.local True     192.168.100.2     192.168.100.1     24                     2                      False              False

This did indeed result in transfer speeds upward of ten gigabits!

The next thing to do was enable RSS. This feature is poorly documented and the configuration information in the Samba manpage says is (at least at the moment) inaccurate and rather unhelpful. Critically, each interface definition must be put in double-quotes as part of the interfaces list in the config file if you want to use the speed and capabilities attributes. Here’s what I added to the Auxiliary Parameters section of the SMB service in FreeNAS:

interfaces = "10.1.0.1;speed=10000000000,capability=RSS" "192.168.100.1;speed=10000000000,capability=RSS"

In my case the interface value had to be an IP address. Attempting to specify an interface name (e.g. lagg0 or vlan101) did not work and caused the SMB daemon not to start because it couldn’t find any working interfaces. This is contrary to what the documentation says. The speed attribute here represents 10G, so tweak as necessary, and obviously the capability attribute is where we specify RSS support[3].

Running Update-SmbMultichannelConnection and Get-SmbMultichannelConnection in Powershell shows that RSS is now enabled for each of the interfaces:

PS C:\WINDOWS\system32> Get-SmbMultichannelConnection

Server Name              Selected Client IP     Server IP     Client Interface Index Server Interface Index Client RSS Capable Client RDMA Capable
-----------              -------- ---------     ---------     ---------------------- ---------------------- ------------------ -------------------
goliath.polynomial.local True     10.1.0.2      10.1.0.1      28                     6                      True               False
goliath.polynomial.local True     192.168.100.2 192.168.100.1 31                     14                     True               False

Personally, though, I don’t like having to remove the LAGs. It means losing the resilience of having two links, and removes the load balancing that can still occur when other devices on the network (with different IPs, so not affected by the switch’s hashing limitation) communicate with the NAS or my workstation. The solution I came up with was to re-enable the LAGs, configure them as before, then create a tagged VLAN on top of that and assign it the secondary IP range. Since this flows over the same LAG it gets hashed just the same and should be properly distributed. I also made sure to enable flow control on the ports to avoid congestion when transfers are happening – without it I found that I was running into stalled transfers now and again.

One issue I did run into along the way was name resolution causing “goliath” to resolve to the secondary IP range from my workstation, rather than the /8 IP. This made multi-channel detect the wrong IP for one of the channels, trying to have 10.1.0.2 talk to 192.168.100.1 instead of 10.1.0.1 as it should. This was fixed by disabling the Link-Layer Topology Discovery Responder, Link-Layer Topology Discovery Mapper I/O Driver, LLDP Protocol Driver, and IPv6 on the VLAN interface, then going into the TCP/IPv4 settings and disabling NetBIOS over TCP/IP. This prevented my workstation from incorrectly resolving the name “goliath” from the VLAN interface using NetBIOS or LLMNR before resolving it from the primary LAN interface. Unfortunately this precedence is not affected by adaptor metrics – I tried.

I’ll wrap this up with the full set of Samba configuration directive I’m using in the SMB service’s Auxiliary Parameters section:

server multi channel support = yes
aio max threads = 100
allocation roundup size = 1048576
aio read size = 1
aio write size = 1
interfaces = "10.1.0.1;speed=10000000000,capability=RSS" "192.168.100.1;speed=10000000000,capability=RSS"

I plan on continuing this work by trying out some other features and expanding my hardware setup, with a goal of hopefully deploying a 100G spine network in my home lab, along with an SSD-based pool in my NAS. Costs are high, though, so it might be a while before I can get there.


Footnotes:

  1. For a while you could actually use NIC teaming on Windows 10 via Powershell, but Microsoft said this was an oversight and quickly disabled LBFO on non-server SKUs. I did take a look at the Powershell cmdlets (they use the MSFT_NetLbfoTeam WMI class, which is implemented in NdisIMPlatCim.dll) to see if there might be a cheeky way to turn it back on, but after some disassembly of the DLL I discovered that the exported functions are empty and just return an error state. As such, I suspect that the normal code paths for using LBFO are conditionally excluded during compile on desktop SKUs, so it’s not as simple as just defeating a runtime check. It may be possible to copy the DLL from a server SKU without breaking anything, and I don’t think WRP covers that particular DLL (a cursory search of the WinSxS Backup directory didn’t yield any results), but I haven’t tried. I did also see some pieces of code in the DLL that might indicate that the full Windows Server binary checks for feature flags in WMI, so that might put a spanner in the works, but I’m not sure. I plan on looking into this further. Update: I figured out how to do this; you can read about it here^
  2. I’m not actually sure if Samba’s implementation of multi-channel on FreeBSD detects the use of LAG interfaces and automatically scales the number of connections like Windows does. The documentation recommends that multi-channel should only be set up with interfaces on separate subnets, which indicates that Samba is simply making regular TCP connections and using the routing tables to have that traffic routed over separate interfaces. This doesn’t necessarily indicate that it doesn’t detect LAGs and scale the number of connections up,  in the knowledge that the default FreeBSD LAG configuration uses L4 information to distribute packets, but it does seem somewhat unlikely. ^
  3. You can also specify RDMA as a capability if your network cards support it. My Intel X520-DA2 cards don’t, but I’ve got some Chelsio T520-CR cards coming which do support it, so I can try out iWARP and SMB Direct. I’m unsure how much configuration is necessary, or whether iWARP can be used at the same time as teaming/LAGs, but I plan to find out. ^

About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK