-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
rhel-9.5
-
None
-
None
-
None
-
rhel-sst-storage-io
-
ssg_filesystems_storage_and_HA
-
3
-
False
-
-
None
-
None
-
None
-
None
-
-
x86_64
-
None
What were you trying to do that didn't work?
We configured a system to boot from SAN over NVMe-TCP. 4 paths are available to the storage array - 2 optimized and 2 non-optimized:
[root@dell-r660 ~]# nvme list-subsys /dev/nvme0n1 nvme-subsys0 - NQN=nqn.1988-11.com.dell:powerstore:00:88b402df2d762AA7AF94 hostnqn=nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0044-4410-8030-b8c04f445833 iopolicy=numa \ +- nvme0 tcp traddr=172.18.240.60,trsvcid=4420,host_traddr=172.18.240.1,src_addr=172.18.240.1 live optimized +- nvme1 tcp traddr=172.18.240.61,trsvcid=4420,host_traddr=172.18.240.1,src_addr=172.18.240.1 live non-optimized +- nvme2 tcp traddr=172.18.230.60,trsvcid=4420,host_traddr=172.18.230.2,src_addr=172.18.230.2 live optimized +- nvme3 tcp traddr=172.18.230.61,trsvcid=4420,host_traddr=172.18.230.2,src_addr=172.18.230.2 live non-optimized
I brought down one of the switch ports associated with port 2 of the initiator. At this point, we are down to 1 optimized and 1 non-optimized path. I rebooted the server and verified that the system recovered. Shortly after, I bring the switch port back up and noticed only 1 of the 2 paths recovered:
[root@dell-r660 ~]# nvme list-subsys /dev/nvme0n1 nvme-subsys0 - NQN=nqn.1988-11.com.dell:powerstore:00:88b402df2d762AA7AF94 hostnqn=nqn.2014-08.org.nvmexpress:uuid:4c4c4544-0044-4410-8030-b8c04f445833 iopolicy=numa \ +- nvme0 tcp traddr=172.18.240.60,trsvcid=4420,host_traddr=172.18.240.1,src_addr=172.18.240.1 live optimized +- nvme1 tcp traddr=172.18.240.61,trsvcid=4420,host_traddr=172.18.240.1,src_addr=172.18.240.1 live non-optimized +- nvme2 tcp traddr=172.18.230.61,trsvcid=4420,host_traddr=172.18.230.2,src_addr=172.18.230.2 live non-optimized
The logs report the following:
Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: bnxt_en 0000:6a:00.1 nbft1: NIC Link is Up, 25000 Mbps (NRZ) full duplex, Flow control: none Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: bnxt_en 0000:6a:00.1 nbft1: FEC autoneg off encoding: None Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: IPv6: ADDRCONF(NETDEV_CHANGE): nbft1: link becomes ready Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com NetworkManager[1782]: <info> [1719932114.9329] device (nbft1): carrier: link connected Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com NetworkManager[1782]: <info> [1719932114.9338] device (nbft1): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'manag> Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com NetworkManager[1782]: <info> [1719932114.9351] policy: auto-activating connection 'nbft1' (5029e745-0d8e-4ee1-8118-4b3faa61e53b) Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com NetworkManager[1782]: <info> [1719932114.9360] device (nbft1): Activation: starting connection 'nbft1' (5029e745-0d8e-4ee1-8118-4b3faa61e53b) Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com NetworkManager[1782]: <info> [1719932114.9361] device (nbft1): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed') Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com NetworkManager[1782]: <info> [1719932114.9366] device (nbft1): state change: prepare -> config (reason 'none', sys-iface-state: 'managed') Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com NetworkManager[1782]: <info> [1719932114.9371] device (nbft1): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed') Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com NetworkManager[1782]: <info> [1719932114.9383] device (nbft1): state change: ip-config -> ip-check (reason 'none', sys-iface-state: 'managed') Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: Starting Network Manager Script Dispatcher Service... Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: Started Network Manager Script Dispatcher Service. Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com NetworkManager[1782]: <info> [1719932114.9723] device (nbft1): state change: ip-check -> secondaries (reason 'none', sys-iface-state: 'managed') Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com NetworkManager[1782]: <info> [1719932114.9728] device (nbft1): state change: secondaries -> activated (reason 'none', sys-iface-state: 'managed') Jul 02 10:55:14 dell-r660.fast.eng.rdu2.dc.redhat.com NetworkManager[1782]: <info> [1719932114.9737] device (nbft1): Activation: successful, device activated. Jul 02 10:55:15 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: Starting Load Kernel Module nvme_fabrics... Jul 02 10:55:15 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: modprobe@nvme_fabrics.service: Deactivated successfully. Jul 02 10:55:15 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: Finished Load Kernel Module nvme_fabrics. Jul 02 10:55:15 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: Starting Connect NBFT-defined NVMe-oF subsystems automatically... Jul 02 10:55:18 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: nvme nvme2: failed to connect socket: -110 Jul 02 10:55:18 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: nvme nvme2: creating 64 I/O queues. Jul 02 10:55:18 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: nvme nvme2: mapped 64/0/0 default/read/poll queues. Jul 02 10:55:18 dell-r660.fast.eng.rdu2.dc.redhat.com kernel: nvme nvme2: new ctrl: NQN "nqn.1988-11.com.dell:powerstore:00:88b402df2d762AA7AF94", addr 172.18.230.61:4420, hostnqn: nqn.2014-08.org.nvmexpress:uu> Jul 02 10:55:18 dell-r660.fast.eng.rdu2.dc.redhat.com nvme[2250]: device: nvme2 Jul 02 10:55:18 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: nvmf-connect-nbft.service: Deactivated successfully. Jul 02 10:55:18 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: Finished Connect NBFT-defined NVMe-oF subsystems automatically. Jul 02 10:55:27 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: systemd-hostnamed.service: Deactivated successfully. Jul 02 10:55:28 dell-r660.fast.eng.rdu2.dc.redhat.com systemd[1]: NetworkManager-dispatcher.service: Deactivated successfully.
Please provide the package NVR for which bug is seen:
kernel-5.14.0-467.el9
nvme-cli-2.9.1-3.1.tbzatek.el9.x86_64
libnvme-1.9-1.1.tbzatek.el9.x86_64
How reproducible: Often
Steps to reproduce
- see above
Expected results
All paths should re-establish after the switch port is brought back up