-
Bug
-
Resolution: Unresolved
-
Major
-
rhel-10.0.beta
-
None
-
anaconda-40.22.3.12-1.el10
-
None
-
Important
-
sst_installer
-
ssg_front_door
-
26
-
5
-
False
-
-
Yes
-
Red Hat Enterprise Linux
-
None
-
Pass
-
None
-
Known Issue
-
-
Proposed
-
-
x86_64
-
None
Just came across a scenario where network interface setup got mismatched, failed to activate, resulting in an unbootable system.
Testing setup is similar as described upstream: https://github.com/timberland-sig/edk2/issues/34, i.e. two boot attempts defined:
- HFI 1, 192.168.122.1:4420, nqn.2014-08.org.nvmexpress.discovery
- HFI 2, 192.168.123.1:4420, nqn.2014-08.org.nvmexpress.discovery
Interface assignment in a working case should look as follows:
1: lo: <snip> 2: nbft0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:72:c5:ae brd ff:ff:ff:ff:ff:ff inet 192.168.122.158/24 brd 192.168.122.255 scope global noprefixroute nbft0 valid_lft forever preferred_lft forever 3: nbft1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:72:c5:af brd ff:ff:ff:ff:ff:ff altname enp0s4 inet 192.168.123.158/24 brd 192.168.123.255 scope global noprefixroute nbft1 valid_lft forever preferred_lft forever
This is done by the dracut 95nvmf module, parsing the ACPI NBFT table and feeding the dracut network module with detailed configuration.
Tearing down the link of the first network interface before boot results in a broken networking setup:
1: lo: <snip> 2: nbft0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000 link/ether 52:54:00:72:c5:ae brd ff:ff:ff:ff:ff:ff 3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 52:54:00:72:c5:af brd ff:ff:ff:ff:ff:ff
Where nbft0 should be assigned to 52:54:00:72:c5:af and 192.168.123.158 should be assigned to that. The 52:54:00:72:c5:ae interface assignment is undefined by the dracut 95nvmf module, theoretically resulting in eth0 or something.
I believe the problem is this:
Mar 20 18:16:20 localhost NetworkManager[479]: <info> [1710958580.4343] NetworkManager (version 1.46.0-1.el9) is starting... (boot:9ccef8f1-b0e8-450e-87e4-4e4d706c5169) Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <info> [1710958580.4709] manager: (eth0): new Ethernet device (/org/freedesktop/NetworkManager/Devices/2) Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <info> [1710958580.4716] bus-manager: acquired D-Bus service "org.freedesktop.NetworkManager" Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <info> [1710958580.4726] manager: (eth1): new Ethernet device (/org/freedesktop/NetworkManager/Devices/3) Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <warn> [1710958580.4727] ifcfg-rh: dbus: couldn't acquire D-Bus service: GDBus.Error:org.freedesktop.DBus.Error.AccessDenied: Request to own name refused by policy Mar 20 18:16:20 localhost.localdomain kernel: virtio_net virtio2 nbft0: renamed from eth0 Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <info> [1710958580.4874] device (eth0): interface index 2 renamed iface from 'eth0' to 'nbft0' Mar 20 18:16:20 localhost.localdomain systemd-udevd[466]: nbft0: Failed to rename network interface 3 from 'eth1' to 'nbft0': File exists Mar 20 18:16:20 localhost.localdomain systemd-udevd[466]: eth1: Failed to process device, ignoring: File exists Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <info> [1710958580.4884] device (eth1): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external') Mar 20 18:16:20 localhost.localdomain systemd[1]: eth1: systemd-udevd failed to process the device, ignoring: File exists Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <info> [1710958580.4901] device (eth1): carrier: link connected Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <info> [1710958580.4902] device (eth1): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed') Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <info> [1710958580.4923] device (nbft0): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external') Mar 20 18:16:30 localhost.localdomain NetworkManager[479]: <info> [1710958590.4995] manager: startup complete
The information parsed by the dracut 95nvmf module appear to be correct and consistent:
# cat /tmp/ifname-52:54:00:72:c5:af nbft0 # cat /tmp/ifname-nbft0 52:54:00:72:c5:af # cat /tmp/net.nbft0.has_ibft_config 52:54:00:72:c5:af # cat /tmp/net.ifaces lo # cat /tmp/nm.done # cat /etc/cmdline.d/35-neednet.conf rd.neednet # cat /etc/cmdline.d/40-nbft.conf ip=192.168.123.158:::24::nbft0:none # cat /etc/cmdline.d/45-ifname.conf ifname=nbft0:52:54:00:72:c5:af # cat /etc/cmdline.d/nvmf-neednet.conf rd.neednet=1 # cat /etc/udev/rules.d/80-ifname.rules SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="52:54:00:72:c5:af", ATTR{type}=="1", NAME="nbft0"
This one appears incorrect, likely generated during installation and for some reason included in the initramfs:
# cat /etc/systemd/network/10-anaconda-ifname-nbft0.link # Generated by Anaconda based on ifname= installer boot option. [Match] MACAddress=52:54:00:72:c5:ae [Link] Name=nbft0
/run/initramfs/rdsosreport.txt also contains output from the 95nvmf:
ip=192.168.123.158:::24::nbft0:none
Steps to reproduce:
1. start qemu, turn link on the first network interface down
2. let the EFI firmware connect to the second NVMe/TCP boot attempt
3. observe success in connection, grub comes up
4. boot the RHEL 9.4 kernel
5. observe dracut getting stuck, unable to find rootfs
kernel-5.14.0-427.el9.x86_64
NetworkManager-1.46.0-1.el9.x86_64
dracut-057-53.git20240104.el9.x86_64
- clones
-
RHEL-30149 [NVMe/NBFT] Incorrect interface setup
- Closed
- links to
-
RHBA-2024:130971 anaconda bug fix and enhancement update