Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-30149

[NVMe/NBFT] Incorrect interface setup

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • rhel-9.5
    • rhel-9.4
    • anaconda
    • None
    • anaconda-34.25.5.9-1.el9
    • None
    • Important
    • sst_installer
    • ssg_front_door
    • 26
    • 5
    • False
    • Hide

      None

      Show
      None
    • Yes
    • Red Hat Enterprise Linux
    • None
    • Bug Fix
    • Hide
      .Stale network link configuration files no longer cause rendering your OS unbootable

      Previously, the RHEL installer created stale `/etc/systemd/network/` link configuration files during the installation. The outdated configuration files interfere with the intended network settings. This leads to an unbootable system if the boot is from NVMe over TCP. With this fix, users no longer need to manually remove, `/etc/systemd/network/10-anaconda-ifname-nbft*.link` files and regenerate the `initramfs` by running the `dracut -f` command.
      Show
      .Stale network link configuration files no longer cause rendering your OS unbootable Previously, the RHEL installer created stale `/etc/systemd/network/` link configuration files during the installation. The outdated configuration files interfere with the intended network settings. This leads to an unbootable system if the boot is from NVMe over TCP. With this fix, users no longer need to manually remove, `/etc/systemd/network/10-anaconda-ifname-nbft*.link` files and regenerate the `initramfs` by running the `dracut -f` command.
    • Done
    • x86_64
    • None

      Just came across a scenario where network interface setup got mismatched, failed to activate, resulting in an unbootable system.

      Testing setup is similar as described upstream: https://github.com/timberland-sig/edk2/issues/34, i.e. two boot attempts defined:

      1. HFI 1, 192.168.122.1:4420, nqn.2014-08.org.nvmexpress.discovery
      2. HFI 2, 192.168.123.1:4420, nqn.2014-08.org.nvmexpress.discovery

      Interface assignment in a working case should look as follows:

      1: lo: <snip>
      2: nbft0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
          link/ether 52:54:00:72:c5:ae brd ff:ff:ff:ff:ff:ff
          inet 192.168.122.158/24 brd 192.168.122.255 scope global noprefixroute nbft0
             valid_lft forever preferred_lft forever
      3: nbft1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
          link/ether 52:54:00:72:c5:af brd ff:ff:ff:ff:ff:ff
          altname enp0s4
          inet 192.168.123.158/24 brd 192.168.123.255 scope global noprefixroute nbft1
             valid_lft forever preferred_lft forever
      

      This is done by the dracut 95nvmf module, parsing the ACPI NBFT table and feeding the dracut network module with detailed configuration.

      Tearing down the link of the first network interface before boot results in a broken networking setup:

      1: lo: <snip>
      2: nbft0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
          link/ether 52:54:00:72:c5:ae brd ff:ff:ff:ff:ff:ff
      3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
          link/ether 52:54:00:72:c5:af brd ff:ff:ff:ff:ff:ff
      

      Where nbft0 should be assigned to 52:54:00:72:c5:af and 192.168.123.158 should be assigned to that. The 52:54:00:72:c5:ae interface assignment is undefined by the dracut 95nvmf module, theoretically resulting in eth0 or something.

      I believe the problem is this:

      Mar 20 18:16:20 localhost NetworkManager[479]: <info>  [1710958580.4343] NetworkManager (version 1.46.0-1.el9) is starting... (boot:9ccef8f1-b0e8-450e-87e4-4e4d706c5169)
      Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <info>  [1710958580.4709] manager: (eth0): new Ethernet device (/org/freedesktop/NetworkManager/Devices/2)
      Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <info>  [1710958580.4716] bus-manager: acquired D-Bus service "org.freedesktop.NetworkManager"
      Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <info>  [1710958580.4726] manager: (eth1): new Ethernet device (/org/freedesktop/NetworkManager/Devices/3)
      Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <warn>  [1710958580.4727] ifcfg-rh: dbus: couldn't acquire D-Bus service: GDBus.Error:org.freedesktop.DBus.Error.AccessDenied: Request to own name refused by policy
      Mar 20 18:16:20 localhost.localdomain kernel: virtio_net virtio2 nbft0: renamed from eth0
      Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <info>  [1710958580.4874] device (eth0): interface index 2 renamed iface from 'eth0' to 'nbft0'
      Mar 20 18:16:20 localhost.localdomain systemd-udevd[466]: nbft0: Failed to rename network interface 3 from 'eth1' to 'nbft0': File exists
      Mar 20 18:16:20 localhost.localdomain systemd-udevd[466]: eth1: Failed to process device, ignoring: File exists
      Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <info>  [1710958580.4884] device (eth1): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
      Mar 20 18:16:20 localhost.localdomain systemd[1]: eth1: systemd-udevd failed to process the device, ignoring: File exists
      Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <info>  [1710958580.4901] device (eth1): carrier: link connected
      Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <info>  [1710958580.4902] device (eth1): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
      Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <info>  [1710958580.4923] device (nbft0): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
      Mar 20 18:16:30 localhost.localdomain NetworkManager[479]: <info>  [1710958590.4995] manager: startup complete
      

      The information parsed by the dracut 95nvmf module appear to be correct and consistent:

      # cat /tmp/ifname-52:54:00:72:c5:af
      nbft0
      # cat /tmp/ifname-nbft0
      52:54:00:72:c5:af
      # cat /tmp/net.nbft0.has_ibft_config
      52:54:00:72:c5:af
      # cat /tmp/net.ifaces
      lo
      # cat /tmp/nm.done
      # cat /etc/cmdline.d/35-neednet.conf
      rd.neednet
      # cat /etc/cmdline.d/40-nbft.conf
      ip=192.168.123.158:::24::nbft0:none
      # cat /etc/cmdline.d/45-ifname.conf
      ifname=nbft0:52:54:00:72:c5:af
      # cat /etc/cmdline.d/nvmf-neednet.conf
      rd.neednet=1
      # cat /etc/udev/rules.d/80-ifname.rules
      SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="52:54:00:72:c5:af", ATTR{type}=="1", NAME="nbft0"
      

      This one appears incorrect, likely generated during installation and for some reason included in the initramfs:

      # cat /etc/systemd/network/10-anaconda-ifname-nbft0.link
      # Generated by Anaconda based on ifname= installer boot option.
      [Match]
      MACAddress=52:54:00:72:c5:ae
      
      [Link]
      Name=nbft0
      

      /run/initramfs/rdsosreport.txt also contains output from the 95nvmf:

      ip=192.168.123.158:::24::nbft0:none
      

      Steps to reproduce:
      1. start qemu, turn link on the first network interface down
      2. let the EFI firmware connect to the second NVMe/TCP boot attempt
      3. observe success in connection, grub comes up
      4. boot the RHEL 9.4 kernel
      5. observe dracut getting stuck, unable to find rootfs

      kernel-5.14.0-427.el9.x86_64
      NetworkManager-1.46.0-1.el9.x86_64
      dracut-057-53.git20240104.el9.x86_64

        1. logs.tar.gz
          115 kB
          Tomáš Bžatek

              tbzatek Tomáš Bžatek
              tbzatek Tomáš Bžatek
              Radek Vykydal
              anaconda-maint-list anaconda-maint-list
              Release Test Team Release Test Team
              Sagar Dubewar Sagar Dubewar
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

                Created:
                Updated:
                Resolved: