Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-52418

NVMe-TCP BFS: Path missing after enabling switch port

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • rhel-9.5
    • nvme-cli
    • None
    • rhel-sst-storage-io
    • ssg_filesystems_storage_and_HA
    • 4
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • x86_64
    • None

      What were you trying to do that didn't work?

      Configured a Dell R660 to boot from SAN over NVMe-TCP. The network ports are up and the paths are all established:

      nbft0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
              inet 172.18.240.1  netmask 255.255.255.0  broadcast 172.18.240.255
              ether 00:62:0b:cb:eb:70  txqueuelen 1000  (Ethernet)
              RX packets 171569  bytes 219222892 (209.0 MiB)
              RX errors 0  dropped 56  overruns 0  frame 0
              TX packets 375599  bytes 536844462 (511.9 MiB)
              TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
      
      nbft1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
              inet 172.18.230.2  netmask 255.255.255.0  broadcast 172.18.230.255
              ether 00:62:0b:cb:eb:71  txqueuelen 1000  (Ethernet)
              RX packets 137807  bytes 168769696 (160.9 MiB)
              RX errors 0  dropped 56  overruns 0  frame 0
              TX packets 367994  bytes 529819180 (505.2 MiB)
              TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0
      
      [root@dell-r660 ~]# nvme list-subsys /dev/nvme0n1
      nvme-subsys0 - NQN=nqn.1992-08.com.netapp:sn.643ecb551e6b11eda647d039ea98949f:subsystem.dellr660
                     hostnqn=nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0044-4410-8030-b8c04f445833
                     iopolicy=round-robin
      \
       +- nvme0 tcp traddr=172.18.240.50,trsvcid=4420,host_traddr=172.18.240.1 connecting optimized
       +- nvme1 tcp traddr=172.18.240.51,trsvcid=4420,host_traddr=172.18.240.1 connecting non-optimized
       +- nvme2 tcp traddr=172.18.230.50,trsvcid=4420,host_traddr=172.18.230.2,src_addr=172.18.230.2 live optimized
       +- nvme3 tcp traddr=172.18.230.51,trsvcid=4420,host_traddr=172.18.230.2,src_addr=172.18.230.2 live non-optimized

      At this point, I disable port 2 (NBFT1) and reboot. The system recovers from reboot as expected. I then enable port 2 and see the following:

      Aug 01 09:45:23 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: bnxt_en 0000:6a:00.1 nbft1: NIC Link is Up, 25000 Mbps (NRZ) full duplex, Flow control: none
      Aug 01 09:45:23 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: bnxt_en 0000:6a:00.1 nbft1: FEC autoneg off encoding: None
      Aug 01 09:45:23 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: IPv6: ADDRCONF(NETDEV_CHANGE): nbft1: link becomes ready
      Aug 01 09:45:23 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: Starting Load Kernel Module nvme_fabrics...
      Aug 01 09:45:23 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: modprobe@nvme_fabrics.service: Deactivated successfully.
      Aug 01 09:45:23 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: Finished Load Kernel Module nvme_fabrics.
      Aug 01 09:45:23 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: Starting Connect NBFT-defined NVMe-oF subsystems automatically...
      Aug 01 09:45:26 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: nvme nvme2: failed to connect socket: -110
      Aug 01 09:45:26 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: nvme nvme2: creating 15 I/O queues.
      Aug 01 09:45:26 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: nvme nvme2: mapped 15/0/0 default/read/poll queues.
      Aug 01 09:45:26 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: nvme nvme2: new ctrl: NQN "nqn.1992-08.com.netapp:sn.643ecb551e6b11eda647d039ea98949f:subsystem.dellr660", addr 172.18.230.51:4420, hostnqn: nqn.201>
      Aug 01 09:45:26 dell-r660.fast.eng.rdu2.dc.redhat.com nvme[2230]: device: nvme2
      Aug 01 09:45:26 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: nvmf-connect-nbft.service: Deactivated successfully.
      Aug 01 09:45:26 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: Finished Connect NBFT-defined NVMe-oF subsystems automatically.
      Aug 01 09:45:36 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
      Aug 01 09:45:42 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: systemd-hostnamed.service: Deactivated successfully.

      I then see that one of the optimized paths was not re-established:
       

      # nvme list-subsys /dev/nvme0n1
      nvme-subsys0 - NQN=nqn.1992-08.com.netapp:sn.643ecb551e6b11eda647d039ea98949f:subsystem.dellr660
                     hostnqn=nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0044-4410-8030-b8c04f445833
                     iopolicy=round-robin
      \
       +- nvme0 tcp traddr=172.18.240.50,trsvcid=4420,host_traddr=172.18.240.1,src_addr=172.18.240.1 live optimized
       +- nvme1 tcp traddr=172.18.240.51,trsvcid=4420,host_traddr=172.18.240.1,src_addr=172.18.240.1 live non-optimized
       +- nvme2 tcp traddr=172.18.230.51,trsvcid=4420,host_traddr=172.18.230.2,src_addr=172.18.230.2 live non-optimized

      I was expecting all the paths to restore.

      Please provide the package NVR for which bug is seen:

      kernel-5.14.0-487.el9

      nvme-cli-2.9.1-4.el9

      How reproducible: Often

      Steps to reproduce

      see above

              tbzatek Tomáš Bžatek
              mpatalan Marco Patalano
              Maurizio Lombardi Maurizio Lombardi
              Marco Patalano Marco Patalano
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: