-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
rhel-9.5
-
None
-
No
-
None
-
rhel-sst-storage-io
-
ssg_filesystems_storage_and_HA
-
4
-
False
-
-
None
-
None
-
None
-
None
-
-
x86_64
-
None
What were you trying to do that didn't work?
Configured a Dell R660 to boot from SAN over NVMe-TCP. The network ports are up and the paths are all established:
nbft0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 172.18.240.1 netmask 255.255.255.0 broadcast 172.18.240.255 ether 00:62:0b:cb:eb:70 txqueuelen 1000 (Ethernet) RX packets 171569 bytes 219222892 (209.0 MiB) RX errors 0 dropped 56 overruns 0 frame 0 TX packets 375599 bytes 536844462 (511.9 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 nbft1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500 inet 172.18.230.2 netmask 255.255.255.0 broadcast 172.18.230.255 ether 00:62:0b:cb:eb:71 txqueuelen 1000 (Ethernet) RX packets 137807 bytes 168769696 (160.9 MiB) RX errors 0 dropped 56 overruns 0 frame 0 TX packets 367994 bytes 529819180 (505.2 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 [root@dell-r660 ~]# nvme list-subsys /dev/nvme0n1 nvme-subsys0 - NQN=nqn.1992-08.com.netapp:sn.643ecb551e6b11eda647d039ea98949f:subsystem.dellr660 hostnqn=nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0044-4410-8030-b8c04f445833 iopolicy=round-robin \ +- nvme0 tcp traddr=172.18.240.50,trsvcid=4420,host_traddr=172.18.240.1 connecting optimized +- nvme1 tcp traddr=172.18.240.51,trsvcid=4420,host_traddr=172.18.240.1 connecting non-optimized +- nvme2 tcp traddr=172.18.230.50,trsvcid=4420,host_traddr=172.18.230.2,src_addr=172.18.230.2 live optimized +- nvme3 tcp traddr=172.18.230.51,trsvcid=4420,host_traddr=172.18.230.2,src_addr=172.18.230.2 live non-optimized
At this point, I disable port 2 (NBFT1) and reboot. The system recovers from reboot as expected. I then enable port 2 and see the following:
Aug 01 09:45:23 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: bnxt_en 0000:6a:00.1 nbft1: NIC Link is Up, 25000 Mbps (NRZ) full duplex, Flow control: none Aug 01 09:45:23 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: bnxt_en 0000:6a:00.1 nbft1: FEC autoneg off encoding: None Aug 01 09:45:23 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: IPv6: ADDRCONF(NETDEV_CHANGE): nbft1: link becomes ready Aug 01 09:45:23 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: Starting Load Kernel Module nvme_fabrics... Aug 01 09:45:23 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: modprobe@nvme_fabrics.service: Deactivated successfully. Aug 01 09:45:23 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: Finished Load Kernel Module nvme_fabrics. Aug 01 09:45:23 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: Starting Connect NBFT-defined NVMe-oF subsystems automatically... Aug 01 09:45:26 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: nvme nvme2: failed to connect socket: -110 Aug 01 09:45:26 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: nvme nvme2: creating 15 I/O queues. Aug 01 09:45:26 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: nvme nvme2: mapped 15/0/0 default/read/poll queues. Aug 01 09:45:26 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: nvme nvme2: new ctrl: NQN "nqn.1992-08.com.netapp:sn.643ecb551e6b11eda647d039ea98949f:subsystem.dellr660", addr 172.18.230.51:4420, hostnqn: nqn.201> Aug 01 09:45:26 dell-r660.fast.eng.rdu2.dc.redhat.com nvme[2230]: device: nvme2 Aug 01 09:45:26 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: nvmf-connect-nbft.service: Deactivated successfully. Aug 01 09:45:26 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: Finished Connect NBFT-defined NVMe-oF subsystems automatically. Aug 01 09:45:36 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully. Aug 01 09:45:42 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: systemd-hostnamed.service: Deactivated successfully.
I then see that one of the optimized paths was not re-established:
# nvme list-subsys /dev/nvme0n1 nvme-subsys0 - NQN=nqn.1992-08.com.netapp:sn.643ecb551e6b11eda647d039ea98949f:subsystem.dellr660 hostnqn=nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0044-4410-8030-b8c04f445833 iopolicy=round-robin \ +- nvme0 tcp traddr=172.18.240.50,trsvcid=4420,host_traddr=172.18.240.1,src_addr=172.18.240.1 live optimized +- nvme1 tcp traddr=172.18.240.51,trsvcid=4420,host_traddr=172.18.240.1,src_addr=172.18.240.1 live non-optimized +- nvme2 tcp traddr=172.18.230.51,trsvcid=4420,host_traddr=172.18.230.2,src_addr=172.18.230.2 live non-optimized
I was expecting all the paths to restore.
Please provide the package NVR for which bug is seen:
kernel-5.14.0-487.el9
nvme-cli-2.9.1-4.el9
How reproducible: Often
Steps to reproduce
see above