• Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • rhel-9.5
    • rhel-9.4
    • anaconda
    • None
    • anaconda-34.25.5.9-1.el9
    • None
    • Important
    • rhel-sst-installer
    • ssg_front_door
    • 26
    • 5
    • False
    • Hide

      None

      Show
      None
    • Yes
    • Red Hat Enterprise Linux
    • None
    • Bug Fix
    • Hide
      .Stale network link configuration files no longer cause rendering your OS unbootable

      Previously, the RHEL installer created stale `/etc/systemd/network/` link configuration files during the installation. The outdated configuration files interfere with the intended network settings. This leads to an unbootable system if the boot is from NVMe over TCP. With this fix, users no longer need to manually remove, `/etc/systemd/network/10-anaconda-ifname-nbft*.link` files and regenerate the `initramfs` by running the `dracut -f` command.
      Show
      .Stale network link configuration files no longer cause rendering your OS unbootable Previously, the RHEL installer created stale `/etc/systemd/network/` link configuration files during the installation. The outdated configuration files interfere with the intended network settings. This leads to an unbootable system if the boot is from NVMe over TCP. With this fix, users no longer need to manually remove, `/etc/systemd/network/10-anaconda-ifname-nbft*.link` files and regenerate the `initramfs` by running the `dracut -f` command.
    • Done
    • x86_64
    • None

      Just came across a scenario where network interface setup got mismatched, failed to activate, resulting in an unbootable system.

      Testing setup is similar as described upstream: https://github.com/timberland-sig/edk2/issues/34, i.e. two boot attempts defined:

      1. HFI 1, 192.168.122.1:4420, nqn.2014-08.org.nvmexpress.discovery
      2. HFI 2, 192.168.123.1:4420, nqn.2014-08.org.nvmexpress.discovery

      Interface assignment in a working case should look as follows:

      1: lo: <snip>
      2: nbft0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
          link/ether 52:54:00:72:c5:ae brd ff:ff:ff:ff:ff:ff
          inet 192.168.122.158/24 brd 192.168.122.255 scope global noprefixroute nbft0
             valid_lft forever preferred_lft forever
      3: nbft1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
          link/ether 52:54:00:72:c5:af brd ff:ff:ff:ff:ff:ff
          altname enp0s4
          inet 192.168.123.158/24 brd 192.168.123.255 scope global noprefixroute nbft1
             valid_lft forever preferred_lft forever
      

      This is done by the dracut 95nvmf module, parsing the ACPI NBFT table and feeding the dracut network module with detailed configuration.

      Tearing down the link of the first network interface before boot results in a broken networking setup:

      1: lo: <snip>
      2: nbft0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc fq_codel state DOWN group default qlen 1000
          link/ether 52:54:00:72:c5:ae brd ff:ff:ff:ff:ff:ff
      3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
          link/ether 52:54:00:72:c5:af brd ff:ff:ff:ff:ff:ff
      

      Where nbft0 should be assigned to 52:54:00:72:c5:af and 192.168.123.158 should be assigned to that. The 52:54:00:72:c5:ae interface assignment is undefined by the dracut 95nvmf module, theoretically resulting in eth0 or something.

      I believe the problem is this:

      Mar 20 18:16:20 localhost NetworkManager[479]: <info>  [1710958580.4343] NetworkManager (version 1.46.0-1.el9) is starting... (boot:9ccef8f1-b0e8-450e-87e4-4e4d706c5169)
      Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <info>  [1710958580.4709] manager: (eth0): new Ethernet device (/org/freedesktop/NetworkManager/Devices/2)
      Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <info>  [1710958580.4716] bus-manager: acquired D-Bus service "org.freedesktop.NetworkManager"
      Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <info>  [1710958580.4726] manager: (eth1): new Ethernet device (/org/freedesktop/NetworkManager/Devices/3)
      Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <warn>  [1710958580.4727] ifcfg-rh: dbus: couldn't acquire D-Bus service: GDBus.Error:org.freedesktop.DBus.Error.AccessDenied: Request to own name refused by policy
      Mar 20 18:16:20 localhost.localdomain kernel: virtio_net virtio2 nbft0: renamed from eth0
      Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <info>  [1710958580.4874] device (eth0): interface index 2 renamed iface from 'eth0' to 'nbft0'
      Mar 20 18:16:20 localhost.localdomain systemd-udevd[466]: nbft0: Failed to rename network interface 3 from 'eth1' to 'nbft0': File exists
      Mar 20 18:16:20 localhost.localdomain systemd-udevd[466]: eth1: Failed to process device, ignoring: File exists
      Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <info>  [1710958580.4884] device (eth1): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
      Mar 20 18:16:20 localhost.localdomain systemd[1]: eth1: systemd-udevd failed to process the device, ignoring: File exists
      Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <info>  [1710958580.4901] device (eth1): carrier: link connected
      Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <info>  [1710958580.4902] device (eth1): state change: unavailable -> disconnected (reason 'carrier-changed', sys-iface-state: 'managed')
      Mar 20 18:16:20 localhost.localdomain NetworkManager[479]: <info>  [1710958580.4923] device (nbft0): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
      Mar 20 18:16:30 localhost.localdomain NetworkManager[479]: <info>  [1710958590.4995] manager: startup complete
      

      The information parsed by the dracut 95nvmf module appear to be correct and consistent:

      # cat /tmp/ifname-52:54:00:72:c5:af
      nbft0
      # cat /tmp/ifname-nbft0
      52:54:00:72:c5:af
      # cat /tmp/net.nbft0.has_ibft_config
      52:54:00:72:c5:af
      # cat /tmp/net.ifaces
      lo
      # cat /tmp/nm.done
      # cat /etc/cmdline.d/35-neednet.conf
      rd.neednet
      # cat /etc/cmdline.d/40-nbft.conf
      ip=192.168.123.158:::24::nbft0:none
      # cat /etc/cmdline.d/45-ifname.conf
      ifname=nbft0:52:54:00:72:c5:af
      # cat /etc/cmdline.d/nvmf-neednet.conf
      rd.neednet=1
      # cat /etc/udev/rules.d/80-ifname.rules
      SUBSYSTEM=="net", ACTION=="add", DRIVERS=="?*", ATTR{address}=="52:54:00:72:c5:af", ATTR{type}=="1", NAME="nbft0"
      

      This one appears incorrect, likely generated during installation and for some reason included in the initramfs:

      # cat /etc/systemd/network/10-anaconda-ifname-nbft0.link
      # Generated by Anaconda based on ifname= installer boot option.
      [Match]
      MACAddress=52:54:00:72:c5:ae
      
      [Link]
      Name=nbft0
      

      /run/initramfs/rdsosreport.txt also contains output from the 95nvmf:

      ip=192.168.123.158:::24::nbft0:none
      

      Steps to reproduce:
      1. start qemu, turn link on the first network interface down
      2. let the EFI firmware connect to the second NVMe/TCP boot attempt
      3. observe success in connection, grub comes up
      4. boot the RHEL 9.4 kernel
      5. observe dracut getting stuck, unable to find rootfs

      kernel-5.14.0-427.el9.x86_64
      NetworkManager-1.46.0-1.el9.x86_64
      dracut-057-53.git20240104.el9.x86_64

            [RHEL-30149] [NVMe/NBFT] Incorrect interface setup

            Release notes are published in RHEL 9.5. No more updates are pending from the documentation side.
            Link to the RN document: https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/9/html-single/9.5_release_notes/index#bug-fixes-installer-and-image-creation

            Sagar Dubewar added a comment - Release notes are published in RHEL 9.5. No more updates are pending from the documentation side. Link to the RN document: https://docs.redhat.com/en/documentation/red_hat_enterprise_linux/9/html-single/9.5_release_notes/index#bug-fixes-installer-and-image-creation

            Errata Tool added a comment -

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (anaconda bug fix and enhancement update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHBA-2024:9118

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (anaconda bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:9118

            While having discussion with tbzatek, it was clarified that this issue is present in 9.4 release and in 9.5, this is actually being fixed. I will create another Jira to mention it in the previous release as a Known issue and this one we can create to add a note as Bug Fix.  

            Issue created for adding Known issue in 9.4 release: https://issues.redhat.com/browse/RHELDOCS-18924 

             

            Sagar Dubewar added a comment - While having discussion with tbzatek , it was clarified that this issue is present in 9.4 release and in 9.5, this is actually being fixed. I will create another Jira to mention it in the previous release as a Known issue and this one we can create to add a note as Bug Fix .   Issue created for adding Known issue in 9.4 release: https://issues.redhat.com/browse/RHELDOCS-18924    

            Update: For 9.5 release, I have created release notes for this issue. 

            tbzatek and rvykydal@redhat.com, please review the release notes on the following google docs and feel free to add your thoughts on this.

            https://docs.google.com/document/d/1RjbxphcLWiebSZlHzYeAej63hvyycXzHDobNuq_N76E/edit?disco=AAABVuclMHQ

            thank you  

            Sagar Dubewar added a comment - Update: For 9.5 release, I have created release notes for this issue.  tbzatek and rvykydal@redhat.com , please review the release notes on the following google docs and feel free to add your thoughts on this. https://docs.google.com/document/d/1RjbxphcLWiebSZlHzYeAej63hvyycXzHDobNuq_N76E/edit?disco=AAABVuclMHQ thank you  

            Jan Stodola added a comment -

            anaconda-34.25.5.9-1.el9 is included in RHEL-9.5.0-20240826.4, moving to Release Pending.

            Jan Stodola added a comment - anaconda-34.25.5.9-1.el9 is included in RHEL-9.5.0-20240826.4, moving to Release Pending.

            This ticket has been added to the tickets.yaml file for the RHEL 9.5 Release Notes.
             

            Gabriela Fialova added a comment - This ticket has been added to the tickets.yaml file for the RHEL 9.5 Release Notes.  

            Jan Stodola added a comment -

            Confirmed with anaconda-34.25.5.9-1.el9 that the installer no longer creates /etc/systemd/network/10-anaconda-ifname-nbft*.link files. No regression has been found during the testing.

            Marking as Preliminary testing: Pass

            Jan Stodola added a comment - Confirmed with anaconda-34.25.5.9-1.el9 that the installer no longer creates /etc/systemd/network/10-anaconda-ifname-nbft*.link files. No regression has been found during the testing. Marking as Preliminary testing: Pass

            Tomáš Bžatek added a comment - Upstream PR: https://github.com/rhinstaller/anaconda/pull/5733

            Hi rvykydal@redhat.com this is the bug we were talking about and you have said that you will take a look so I set needinfo on you to not loose track of this.

            Jiri Konecny added a comment - Hi rvykydal@redhat.com this is the bug we were talking about and you have said that you will take a look so I set needinfo on you to not loose track of this.

            Tomáš Bžatek added a comment - - edited

            Debugging possible duplicate RHEL-32146 revealed that interface renaming through /etc/systemd/network/ link files races with udev rules (e.g. generated by dracut on every boot), leading to random failures during NBFT boot. Additionally, interface renaming gets place again after switchroot, renaming live interfaces, possibly with slightly different settings than in initramfs (two sets of files).

            Additionally this leads to delays in the boot process:

            May 20 17:59:08 localhost.localdomain NetworkManager[1114]: <warn>  [1716220748.2867] settings: startup-complete: profile "nbft0" (ada8fd63-de9f-42e2-83fe-74f33359e1b1) was waiting for non-existing device (with timeout "connection.wait-device-timeout=60000")
            May 20 17:59:08 localhost.localdomain NetworkManager[1114]: <warn>  [1716220748.2868] settings: startup-complete: profile "nbft0" (971f117a-6632-4026-9857-76072d3595e8) was waiting for non-existing device (with timeout "connection.wait-device-timeout=60000")
            

            So this is rather fragile. Adding an exclusion rule to Anaconda might be a good first workaround. Existing installations might need explicit purge of nbft-related link files (TBD).

            Tomáš Bžatek added a comment - - edited Debugging possible duplicate RHEL-32146 revealed that interface renaming through /etc/systemd/network/ link files races with udev rules (e.g. generated by dracut on every boot), leading to random failures during NBFT boot. Additionally, interface renaming gets place again after switchroot, renaming live interfaces, possibly with slightly different settings than in initramfs (two sets of files). Additionally this leads to delays in the boot process: May 20 17:59:08 localhost.localdomain NetworkManager[1114]: <warn> [1716220748.2867] settings: startup-complete: profile "nbft0" (ada8fd63-de9f-42e2-83fe-74f33359e1b1) was waiting for non-existing device (with timeout "connection.wait-device-timeout=60000") May 20 17:59:08 localhost.localdomain NetworkManager[1114]: <warn> [1716220748.2868] settings: startup-complete: profile "nbft0" (971f117a-6632-4026-9857-76072d3595e8) was waiting for non-existing device (with timeout "connection.wait-device-timeout=60000") So this is rather fragile. Adding an exclusion rule to Anaconda might be a good first workaround. Existing installations might need explicit purge of nbft-related link files (TBD).

              tbzatek Tomáš Bžatek
              tbzatek Tomáš Bžatek
              Radek Vykydal
              anaconda-maint-list anaconda-maint-list
              Release Test Team Release Test Team
              Sagar Dubewar Sagar Dubewar
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

                Created:
                Updated:
                Resolved: