Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-45912

NVMe-FC BFS: paths do not re-establish after enabling switch port

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • rhel-9.5
    • nvme-cli
    • None
    • rhel-sst-storage-io
    • ssg_filesystems_storage_and_HA
    • 3
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • x86_64
    • None

      What were you trying to do that didn't work? 

      We configured a system to boot from SAN over NVMe-TCP. 4 paths are available to the storage array - 2 optimized and 2 non-optimized:

      [root@dell-r660 ~]# nvme list-subsys /dev/nvme0n1
      nvme-subsys0 - NQN=nqn.1988-11.com.dell:powerstore:00:88b402df2d762AA7AF94
                     hostnqn=nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0044-4410-8030-b8c04f445833
                     iopolicy=numa
      \
       +- nvme0 tcp traddr=172.18.240.60,trsvcid=4420,host_traddr=172.18.240.1,src_addr=172.18.240.1 live optimized
       +- nvme1 tcp traddr=172.18.240.61,trsvcid=4420,host_traddr=172.18.240.1,src_addr=172.18.240.1 live non-optimized
       +- nvme2 tcp traddr=172.18.230.60,trsvcid=4420,host_traddr=172.18.230.2,src_addr=172.18.230.2 live optimized
       +- nvme3 tcp traddr=172.18.230.61,trsvcid=4420,host_traddr=172.18.230.2,src_addr=172.18.230.2 live non-optimized
      

      I brought down one of the switch ports associated with port 2 of the initiator. At this point, we are down to 1 optimized and 1 non-optimized path. I rebooted the server and verified that the system recovered. Shortly after, I bring the switch port back up and noticed only 1 of the 2 paths recovered:

      [root@dell-r660 ~]# nvme list-subsys /dev/nvme0n1
      nvme-subsys0 - NQN=nqn.1988-11.com.dell:powerstore:00:88b402df2d762AA7AF94
                     hostnqn=nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0044-4410-8030-b8c04f445833
                     iopolicy=numa
      \
       +- nvme0 tcp traddr=172.18.240.60,trsvcid=4420,host_traddr=172.18.240.1,src_addr=172.18.240.1 live optimized
       +- nvme1 tcp traddr=172.18.240.61,trsvcid=4420,host_traddr=172.18.240.1,src_addr=172.18.240.1 live non-optimized
       +- nvme2 tcp traddr=172.18.230.61,trsvcid=4420,host_traddr=172.18.230.2,src_addr=172.18.230.2 live non-optimized
      

      The logs report the following:

      Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: bnxt_en 0000:6a:00.1 nbft1: NIC Link is Up, 25000 Mbps (NRZ) full duplex, Flow control: none
      Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: bnxt_en 0000:6a:00.1 nbft1: FEC autoneg off encoding: None
      Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: IPv6: ADDRCONF(NETDEV_CHANGE): nbft1: link becomes ready
      Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com NetworkManager[1782]: <info>  [1719932114.9329] device (nbft1): carrier: link connected
      Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com NetworkManager[1782]: <info>  [1719932114.9338] device (nbft1): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'manag>
      Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com NetworkManager[1782]: <info>  [1719932114.9351] policy: auto-activating connection 'nbft1' (5029e745-0d8e-4ee1-8118-4b3faa61e53b)
      Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com NetworkManager[1782]: <info>  [1719932114.9360] device (nbft1): Activation: starting connection 'nbft1' (5029e745-0d8e-4ee1-8118-4b3faa61e53b)
      Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com NetworkManager[1782]: <info>  [1719932114.9361] device (nbft1): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
      Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com NetworkManager[1782]: <info>  [1719932114.9366] device (nbft1): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
      Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com NetworkManager[1782]: <info>  [1719932114.9371] device (nbft1): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
      Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com NetworkManager[1782]: <info>  [1719932114.9383] device (nbft1): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed')
      Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: Starting Network Manager Script Dispatcher Service...
      Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: Started Network Manager Script Dispatcher Service.
      Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com NetworkManager[1782]: <info>  [1719932114.9723] device (nbft1): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed')
      Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com NetworkManager[1782]: <info>  [1719932114.9728] device (nbft1): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed')
      Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com NetworkManager[1782]: <info>  [1719932114.9737] device (nbft1): Activation: successful, device activated.
      Jul 02 10:55:15 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: Starting Load Kernel Module nvme_fabrics...
      Jul 02 10:55:15 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: modprobe@nvme_fabrics.service: Deactivated successfully.
      Jul 02 10:55:15 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: Finished Load Kernel Module nvme_fabrics.
      Jul 02 10:55:15 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: Starting Connect NBFT-defined NVMe-oF subsystems automatically...
      Jul 02 10:55:18 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: nvme nvme2: failed to connect socket: -110
      Jul 02 10:55:18 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: nvme nvme2: creating 64 I/O queues.
      Jul 02 10:55:18 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: nvme nvme2: mapped 64/0/0 default/read/poll queues.
      Jul 02 10:55:18 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: nvme nvme2: new ctrl: NQN "nqn.1988-11.com.dell:powerstore:00:88b402df2d762AA7AF94", addr 172.18.230.61:4420, hostnqn: nqn.2014-08.org.nvmexpress:uu>
      Jul 02 10:55:18 dell-r660.fast.eng.rdu2.dc.redhat.com nvme[2250]: device: nvme2
      Jul 02 10:55:18 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: nvmf-connect-nbft.service: Deactivated successfully.
      Jul 02 10:55:18 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: Finished Connect NBFT-defined NVMe-oF subsystems automatically.
      Jul 02 10:55:27 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: systemd-hostnamed.service: Deactivated successfully.
      Jul 02 10:55:28 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.

      Please provide the package NVR for which bug is seen:

      kernel-5.14.0-467.el9

      nvme-cli-2.9.1-3.1.tbzatek.el9.x86_64

      libnvme-1.9-1.1.tbzatek.el9.x86_64

      How reproducible: Often

      Steps to reproduce

      1. see above

      Expected results

      All paths should re-establish after the switch port is brought back up

       

       

              tbzatek Tomáš Bžatek
              mpatalan Marco Patalano
              Maurizio Lombardi Maurizio Lombardi
              Marco Patalano Marco Patalano
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: