2

Ubuntu 22.04 LTS as iSER client to NetApp E-Series

 11 months ago
source link: https://scaleoutsean.github.io/2023/09/22/ubuntu-lts-netapp-eseries-iser.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
neoserver,ios ssh client

Ubuntu 22.04 LTS as iSER client to NetApp E-Series

22 Sep 2023 -

15 minute read

Problem statement

WTF is iSER? iSER aka iSCSI Extensions for RDMA is a computer network protocol that extends the Internet Small Computer System Interface (iSCSI) protocol to use Remote Direct Memory Access (RDMA).

NetApp E-Series supports iSER. Here’s an existing image from another blog post that shows EF-Series EF300 and EF600, but E-Series 5700 also has the 100G IB Host Interface Card (HIC) option. See “iSER” in “I/O Interface Options”.

EF300 and EF600 Client Connectivity

Anyway, with 100G IB you can get low-latency 100 Gbps (per port) iSCSI to the box, which is very nice for many Big Data, analytics and HPC workloads.

The documentation for Linux with iSER naturally exists, but not for Ubuntu.

Your first choice would be to ask NetApp to qualify Ubuntu (or Debian, or what have you) as a one-off thing. This sometimes happens when you must have the entire stack validated end-to-end.

Your second choice is to Just Do It (TM), if you can.

Maybe this post can help you to “yes, can do” iSER!

E-Series side

There’s nothing special about configuring E-Series iSER for Ubuntu. Which is also why Oracle KVM should just work with E-Series - same kernel, same drivers, etc.

So, work by the official docs and check this page for supplemental information.

Ubuntu side

This is where it can get tricky. Long story short, there’s one major choice to make, and that is “to use or not to use Mellanox OFED?” and the answer I offer is “use it”.

Officially, supported Linux seem to be tested with built-in drivers.

I tried that using Ubuntu iSER with EF570 and it was crap. Since you’re using an unsupported OS, you may as well use IB driver that work.

The rest is “common sense with Ubuntu characteristics”: follow the E-Series documentation (link at the top) and configure Linux stuff the Ubuntu way.

Hardware and software stack

  • x86_64 server with Mellanox ConnectX-5 (dual-ported 100Gb/ IB HCA; model: MCX556A-ECAT; FW 16.35.2000)
  • Ubuntu 22.04 LTS with all updates as of Sep 21, 2023
    • 4 IB IPs, 2 on 192.168.100.0/24 and two on 192.168.101.0/24 network
    • NIC names configured for iSER: ibs1f0, ibs1f1, ibs5f0, ibs5f1
    • Mellanox OFED LTS 5.8-3.0.7.0
  • E-Series SANtricity 11.80 (model: EF-Series EF570)
    • Controller A: 192.168.100.1 (Port 1), 192.168.101.1
    • Controller B: 192.168.100.2 (Port 1), 192.168.101.2
  • No IB switches (direct attach, aka DAS)

One note on DAS: usually we connect 1 or 2 servers this way, and 2 servers with 2 ports each consume 4 ports on E-Series IB HICs.

One example of that is BeeGFS node pairs connected E-Series, although that solution by default uses NVMe/RoCE rather than iSER/IB (but it could use iSER/IB, as could IBM Spectrum Scale or other applications).

Setup workflow

It’s helpful to understand what needs to be done. Roughly speaking:

  • Install IB drivers
  • Figure out how everything is connected (you were supposed to know before you started, but let’s assume that’s an afterthought)
  • Get IB to work (Link Up, etc.)
  • Configure iSCSI on top of IB (iSER)
  • Discover and login to targets
  • Ensure multipathing works, format and mount

Tips and tricks

The primary objective of this post is to share information related to Ubuntu with E-Series without repeating what’s in the official documentation.

I’ll make these points in small sections. Some of it is “common sense stuff” that’s not documented in the E-Series documentation because it’s out of scope, but some of it is missing because Ubuntu is not in-scope.

SANtricity Host settings

In order to configure E-Series iSER so that a server or cluster can connect to it via iSER, you need to pick iSER in Host settings.

That’s obvious, but it’s not obvious.

Remember to pick iSER and not use the default (iSCSI): you need the other iSCSI - iSER!

eseries-iser-06-host-iser-add.png

Another weird thing is the host type: you’d expect one of the several Linux settings should apply. Great intuition! Except that you’re wrong, as was I.

In my experience, you should select “Factory Default”. That’s nonsense and I don’t know why it has to be that way in 2023, but it is what it is.

eseries-iser-09-santricity-host-settings.png

(Notice that in this screenshot, iSCSI interface is selected. That won’t work for iSER!)

Update (October 04, 2023)

I realized that the undocumented “default” multipathing algorithm for E-Serie is now scsi_dh_alua, which - again, in theory - should automatically load on a host connected to E-Series, which would then make it possible to set Host Type to Linux with DM and post version 3.10 kernel.

I find it funny that I don’t even know what the default is, but when I tried to use it, it didn’t work anyway, so I’m still using host type Factory Default… With Linux DP 3.10+ SANtricity give sme a warning about suboptimal paths.

SANtricity iSER settings

This isn’t hard, but the official documentation doesn’t contain screenshots (too damn hard?) so: go to Settings > System and under General you’ll find two iSER-related entries.

eseries-iser-01-configure-eseries.png

This is where you pick a controller (A or B) and then can see if the ports are plugged in and Up/Down, as well as assign IPs to them.

eseries-iser-02-configure-eseries-iser-ip.png

With DAS, there’s no switch and ports will be down as long as proper drivers and OpenSM haven’t started on the host connected to these ports!

Just below that “Configure iSER over InfiniBand settings” you’ll see iSER (and RDMA) statistics. If multiple active volumes are balanced and multipathing is in place, controller ports should be approximately evenly utilized.

eseries-iser-08-santricity-iser-stats.png

This is also the place to look for excessive errors and such.

OS rescue mode

This is not E-Series-specific, but you may need it.

Before you start, try to boot your OS to Rescue Mode and make sure you are familiar with it (such as how to change OS configuration files in rescue mode).

The reason is if you screw up and make OS hang on boot, you may not be able to get in to unscrew the problem.

I got caught off-guard several times and almost had to re-install the OS…

eseries-iser-04-edit-fstab.png

What’s what and Mellanox OFED

As mentioned in Setup Steps, we need to know what’s what:

  • Which NICs are IB cards
  • How is everything connected - ports, device IDs, switch (if any) ports, E-Series controller, etc.

The E-Series documentation calls for identification of devices, ports, GIDs and what not.

You need to do that in any case (to understand how everything is connected), but because I installed Mellanox OFED, I did not have to do configure, create and enable OpenSM service.

But since we should run some diags for proper awareness, here’s an example of how ibstat output for one port on one of the HCAs looked like.

$ sudo ibstat
CA 'ibp134s0f0'
        CA type: MT4119
        Number of ports: 1
        Firmware version: 16.26.1040
        Hardware version: 0
        Node GUID: 0x1c34da03007ca2da
        System image GUID: 0x1c34da03007ca2da
        Port 1:
                State: Active
                Physical state: LinkUp
                Rate: 100
                Base lid: 1
                LMC: 0
                SM lid: 1
                Capability mask: 0x2651e84a
                Port GUID: 0x1c34da03007ca2da
                Link layer: InfiniBand
...

How I installed Mellanox OFED:

$ sudo ./mlnxofedinstall  --umad-dev-rw --all --enable-opensm

After I installed Mellanox OFED I enabled and started NVIDIA-packaged OpenSM service (remember, I did not create my own despite the E-Series documentation instructing otherwise).

● opensm.service - LSB: Start opensm subnet manager.
     Loaded: loaded (/etc/init.d/opensm; generated)
     Active: active (running) since Thu 2023-09-21 17:38:05 UTC; 1h 30min ago
       Docs: man:systemd-sysv-generator(8)
      Tasks: 320 (limit: 115284)
     Memory: 30.1M
        CPU: 3.656s
     CGroup: /system.slice/opensm.service
             ├─2616 /usr/sbin/opensm -g 0x1c34da03007ca283 -f /var/log/opensm.0x1c34da03007ca283.log
             ├─2631 /usr/sbin/opensm -g 0x1c34da03007ca2db -f /var/log/opensm.0x1c34da03007ca2db.log
...

Notice the -g GID thing in the last two lines? That’s one of the nice things Mellanox OFED stack does for us - we didn’t have to assemble that configuration by ourselves as the E-Series iSER documentation suggests (and it suggests so because, as I mentioned, it’s based on built-in RDMA and IB drivers and OpenSM service must be configured manually).

After installation of OFED you may notice there’s a legacy service, srp_service, running. You may stop, and later disable it, if you don’t need it.

You may want to reboot here and have those rescue mode instructions for Ubuntu 22.04 handy!

Netplan and MTU

One thing I noticed about this is on unconfigured IB interfaces MTU was 4092.

Once I configured them in /etc/netplan/*.yaml, and they got an IP address assigned, IB MTUs became 2044.

I first tried the easiest (and the most naive approach, but it was too easy and I had to try) - I set Netplan MTU to 4092. Yeah, no. Still 2044.

I this works as expected as I saw some details about it in the Mellanox documentation. Read it if you want to try 4090.

Netplan requires very minimal configuration for these two (ibs1, ibs5) dual-ported HCAs.

    ibs1f0:
      addresses:
        - 192.168.100.10/24
    ibs5f0:
      addresses:
        - 192.168.101.10/24
    ibs1f1:
      addresses:
        - 192.168.100.20/24
    ibs5f1:
      addresses:
        - 192.168.101.20/24

If the server’s OFED drivers, OpenSM and links are up, then you may expect to see something like this (example for the first card):

7: ibs1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256
    link/infiniband 00:00:0c:33:fe:80:00:00:00:00:00:00:1c:34:da:03:00:7c:a2:82 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    altname ibp47s0f0
    inet 192.168.100.10/24 brd 192.168.100.255 scope global ibs1f0
       valid_lft forever preferred_lft forever
8: ibs1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256
    link/infiniband 00:00:0d:01:fe:80:00:00:00:00:00:00:1c:34:da:03:00:7c:a2:83 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    altname ibp47s0f1
    inet 192.168.100.20/24 brd 192.168.100.255 scope global ibs1f1
       valid_lft forever preferred_lft forever

Notice here ibs1f0 has an “altname”, ibp47s0f0, which indicates that doing some homework mapping out PCI slots, IB diagnostics output and network configuration may pay back if you get in trouble.

ifaces

On Ubuntu, those iSCSI configuration files are stored in /etc/iscsi/ifaces/.

Create and edit them as the E-Series documentation suggests.

My server had 2 HCAs, each with 2 ports, so I created four files here. You can name these files any way you want.

$ dir -lat /etc/iscsi/ifaces
iface-ibs5f1
iface-ibs5f0
iface-ibs1f1
iface-ibs1f0

Multipath

You’ll need these packages, and the services enabled and started. I also updated Linux initramfs, but I don’t think I needed it because the OS boots from local disk.

sudo apt-get install -y multipathd multipath-tools 

The E-Series docs tell you to use an empty /etc/multipath.conf because the defaults are already correct for E/EF-Series, and that’s fine, but you can also hard-code the defaults if you want to see what they are or need to accommodate other storage such as ONTAP iSCSI or such.

defaults {
        verbosity 2
        polling_interval 5
        max_polling_interval 20
        reassign_maps "no"
        path_selector "service-time 0"
        path_grouping_policy "failover"
        uid_attribute "ID_SERIAL"
        prio "const"
        prio_args ""
        features "0"
        path_checker "tur"
        alias_prefix "mpath"
        failback "manual"
        rr_min_io 1000
        rr_min_io_rq 1
        max_fds "max"
        rr_weight "uniform"
        queue_without_daemon "no"
        allow_usb_devices "no"
        flush_on_last_del "no"
        user_friendly_names "no"
        fast_io_fail_tmo 5
        log_checker_err "always"
        all_tg_pt "no"
        retain_attached_hw_handler "yes"
        detect_prio "yes"
        detect_checker "yes"
        force_sync "yes"
        strict_timing "no"
        deferred_remove "no"
        delay_watch_checks "no"
        delay_wait_checks "no"
        san_path_err_threshold "no"
        san_path_err_forget_rate "no"
        san_path_err_recovery_time "no"
        marginal_path_err_sample_time "no"
        marginal_path_err_rate_threshold "no"
        marginal_path_err_recheck_gap_time "no"
        marginal_path_double_failed_time "no"
        find_multipaths "on"
        uxsock_timeout 4000
        retrigger_tries 0
        retrigger_delay 10
        missing_uev_wait_timeout 30
        skip_kpartx "no"
        disable_changed_wwids "ignored"
        remove_retries 0
        ghost_delay "no"
        find_multipaths_timeout -10
        enable_foreign "NONE"
        marginal_pathgroups "no"
        recheck_wwid "no"
}
blacklist {
        devnode "!^(sd[a-z]|dasd[a-z]|nvme[0-9])"
        device {
                vendor "(NETAPP|LSI|ENGENIO)"
                product "Universal Xport"
        }
}
blacklist_exceptions {
        property "(SCSI_IDENT_|ID_WWN)"
}
devices {
        device {
                vendor "(NETAPP|LSI|ENGENIO)"
                product "INF-01-00"
                product_blacklist "Universal Xport"
                path_grouping_policy "group_by_prio"
                path_checker "rdac"
                features "2 pg_init_retries 50"
                hardware_handler "1 rdac"
                prio "rdac"
                failback "immediate"
                no_path_retry 30
        }
}

overrides {
}
As mentioned in the update before, whether that device entry (NetApp LSI Engenio) is there or not, it’s there by default anyway, and - see the explanation near the top - scsi_dh_alua should load by default. But in my experience it does not.

Now if you log in to your iSER targets, you should be able to see multiple paths (in theory, we should see hwhandler="scsh_dh_alua" in there).

$ sudo multipath -ll
3600a098000e3c1b000002be3620b681e dm-3 NETAPP,INF-01-00
size=399G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 15:0:0:1 sdb 8:16 active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  `- 16:0:0:1 sdq 65:0 active ready running
3600a098000e3c1b000002d14634f47c9 dm-0 NETAPP,INF-01-00
size=100G features='3 queue_if_no_path pg_init_retries 50' hwhandler='1 alua' wp=rw
|-+- policy='service-time 0' prio=50 status=active
| `- 15:0:0:5 sdf 8:80  active ready running
`-+- policy='service-time 0' prio=10 status=enabled
  `- 16:0:0:5 sdu 65:64 active ready running
...

If you don’t care about artsy-fartsy (also known as “user-friendly”) device names, you could format and mount these devices (or add mount points to /etc/fstab).

Example:

/dev/mapper/3600a098000f637140000284763a83f44 /mount/data xfs4 _netdev 0 0

Target discovery

Before we mount and use any iSCSI (and also iSER) we need to discover them and login to iSCSI portal. But remember, here we’re not discovering iSCSI, but iSER targets (“the other iSCSI”).

Two things to say about that:

1) One funny thing that kept happening (although it didn’t bother me much) is the stupid 192.168.130.x/24 IPs which were not E-Series controller IPs. I think the reason is the IB HCAs have BlueField functionality included, so it’s coming from that. (I haven’t used BlueField, but I want to learn enough to be able to disable it!)

$ sudo iscsiadm -m discovery -t st -p 192.168.100.1 -I iser
192.168.130.101:3260,1 iqn.1992-08.com.netapp:5700.600a098000f63714000000005e79c17c
192.168.131.101:3260,1 iqn.1992-08.com.netapp:5700.600a098000f63714000000005e79c17c
192.168.100.1:3260,1 iqn.1992-08.com.netapp:5700.600a098000f63714000000005e79c17c
192.168.101.1:3260,1 iqn.1992-08.com.netapp:5700.600a098000f63714000000005e79c17c
192.168.130.102:3260,2 iqn.1992-08.com.netapp:5700.600a098000f63714000000005e79c17c
192.168.131.102:3260,2 iqn.1992-08.com.netapp:5700.600a098000f63714000000005e79c17c
192.168.100.2:3260,2 iqn.1992-08.com.netapp:5700.600a098000f63714000000005e79c17c
192.168.101.2:3260,2 iqn.1992-08.com.netapp:5700.600a098000f63714000000005e79c17c

Anyway, the main thing here is you don’t want to see any unexpected errors during discovery and login to iSER.

2) The other noteworthy detail - is “-I iser” should supposedly be used to scan iSER targets. Using device from /etc/iscsi/ifaces/ worked for me as well (“-I iface-ibs5f1”, for example). I think that’s because iSER is configured in those interface files.

With -I iser (and the stupid BlueField IPs that I didn’t have time to get rid of):

eseries-iser-03-configure-client-iser-ip-and-login.png

With full iface names:

eseries-iser-05-scan-from-limited-ifaces.png

In the four iface files that I created, I only modified two values in each, iface.initiatorname (I used the OS-set iSCSI initiator name for all 4 files) and iface.net_ifacename (use iface name from “ip address” output, which is different for each interface: ibs1f0, ibs1f1, ibs5f0, and ibs5f1 for the last one). The latter I think ensures that only IB transport is used during discovery.

I don’t know if iSER would work better with more detailed settings (HWADDR, MTU, etc.) in iface files, but it seems it works fine without them.

Anyway, this is another good chance to reboot and see if everything (starting with the OS) can come up again. Have those OS rescue mode instructions ready!

Don’t panic if it takes a while for stuff to come up. This step takes long enough to get you start thinking about rescue mode.

eseries-iser-07-mellanox-ofed.png

Volume partitioning and sanlun command

You can see the sanlun command mentioned in several places, including iSER-related pages.

First, that command is from one of the optional “utility” packages that NetApp provides for SAN products (mostly E-Series, but there’s a similar or maybe identical package for ONTAP FC). I seem to recall it can’t even be installed on Ubuntu or Debian, so don’t even bother.

Second, you don’t really need it anyway. I think it was needed in 2004 when Linux was more crap than it is today. Just use Linux shell commands.

What about volume partitioning?

My view is on flash disks I wouldn’t even partition LUNs unless I test and see partitioned disks are faster or have some other advantage over non-partitioned (I can’t think of any). Maybe they are when they’re NL-SAS, but for flash I doubt. So I wouldn’t bother with that step. For flash-based disks, I’d just run mkfs on the device and move on.

You can try and create partitioned devices and run a simple synthetic benchmark that resembles your workload to see if it matters.

Conclusion

Most E-Series users who want to use Ubuntu shouldn’t care that it’s not supported.

If you ask NetApp to qualify it for your environment, I suggest asking for Ubuntu LTS with Mellanox OFED.

I will use this system in coming weeks and months so I may have new finding, but I don’t think there’s much that can go wrong here:

  • the mature and well-known components are iSCSI initiator and libraries, Multipath / Device Mapper, and kernel in general
  • the sensitive parts are IB and RDMA and with OFED that’s solved for you by NVIDIA who, by the way, are heavy users of Ubuntu, so it’s not like this is something very risky or new

Ubuntu and NVIDIA / Mellanox OFED solve your client-side challenges, and storage-side (E-Series IB uses Mellanox HCAs, as far as I remember) then becomes easy.

This approach lets you use Ubuntu and iSER with E-Series. While the DIY approach may involve mild shortcuts, this gives you LXD, ZFS and some other solutions and features which work best on Ubuntu. I think it’s a decent tradeoff if you need those features or standardize on Ubuntu or Debian-like distros that Mellanox OFED supports.

You could use Ubuntu easier without iSER (FC or 25Gb/s iSCSI), but iSER is fast and may be the right choice for AI, analytics and other environments.


About Joyk


Aggregate valuable and interesting links.
Joyk means Joy of geeK