Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-56264

[dracut] 95nvmf: Investigate the need for nbftroot.sh

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Normal Normal
    • None
    • rhel-9.5
    • dracut
    • None
    • rhel-sst-cs-plumbers
    • ssg_core_services
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • All
    • None

      As one of the optimization tasks and in order to make the boot process faster, only one nvme connect-all --nbft call should be made. Investigating the dracut run with rd.debug it came to my attention that the nvmf-autoconnect.sh hook is called too may times, mostly from the nbftroot.sh side:

      # journalctl | grep -e nvmf-autoconnect.sh -e /usr/sbin/nbftroot -e '= timeout' -e '++ command -v nbftroot'
      Aug 27 12:03:38 localhost dracut-cmdline[276]: ///lib/dracut/hooks/cmdline/92-parse-nvmf-boot-connections.sh@325(source): /sbin/initqueue --settled --onetime --name nvmf-connect-settled /sbin/nvmf-autoconnect.sh settled
      Aug 27 12:03:38 localhost dracut-cmdline[449]: + echo /sbin/nvmf-autoconnect.sh settled
      Aug 27 12:03:38 localhost dracut-cmdline[276]: ///lib/dracut/hooks/cmdline/92-parse-nvmf-boot-connections.sh@326(source): /sbin/initqueue --timeout --onetime --name nvmf-connect-timeout /sbin/nvmf-autoconnect.sh timeout
      Aug 27 12:03:38 localhost dracut-cmdline[451]: + echo /sbin/nvmf-autoconnect.sh timeout
      Aug 27 12:03:40 localhost.localdomain dracut-initqueue[664]: ++ command -v nbftroot
      Aug 27 12:03:40 localhost.localdomain dracut-initqueue[655]: + /usr/sbin/nbftroot lo nbft /sysroot
      Aug 27 12:03:40 localhost.localdomain dracut-initqueue[666]: + '[' online = timeout ']'
      Aug 27 12:03:43 localhost.localdomain dracut-initqueue[718]: ++ command -v nbftroot
      Aug 27 12:03:43 localhost.localdomain dracut-initqueue[709]: + /usr/sbin/nbftroot nbft0 nbft /sysroot
      Aug 27 12:03:43 localhost.localdomain dracut-initqueue[720]: + '[' online = timeout ']'
      Aug 27 12:03:46 localhost.localdomain dracut-initqueue[752]: ++ command -v nbftroot
      Aug 27 12:03:46 localhost.localdomain dracut-initqueue[743]: + /usr/sbin/nbftroot nbft1 nbft /sysroot
      Aug 27 12:03:46 localhost.localdomain dracut-initqueue[754]: + '[' online = timeout ']'
      Aug 27 12:03:49 localhost.localdomain dracut-initqueue[617]: //lib/dracut/hooks/initqueue/settled/nvmf-connect-settled449.sh@2(): /sbin/nvmf-autoconnect.sh settled
      Aug 27 12:03:49 localhost.localdomain dracut-initqueue[767]: + '[' settled = timeout ']'
      Aug 27 12:03:53 localhost.localdomain dracut-initqueue[806]: ++ command -v nbftroot
      Aug 27 12:03:53 localhost.localdomain dracut-initqueue[797]: + /usr/sbin/nbftroot lo nbft /sysroot
      Aug 27 12:03:53 localhost.localdomain dracut-initqueue[808]: + '[' online = timeout ']'
      Aug 27 12:03:56 localhost.localdomain dracut-initqueue[841]: ++ command -v nbftroot
      Aug 27 12:03:56 localhost.localdomain dracut-initqueue[832]: + /usr/sbin/nbftroot nbft0 nbft /sysroot
      Aug 27 12:03:56 localhost.localdomain dracut-initqueue[843]: + '[' online = timeout ']'
      Aug 27 12:03:59 localhost.localdomain dracut-initqueue[876]: ++ command -v nbftroot
      Aug 27 12:03:59 localhost.localdomain dracut-initqueue[867]: + /usr/sbin/nbftroot nbft1 nbft /sysroot
      Aug 27 12:03:59 localhost.localdomain dracut-initqueue[878]: + '[' online = timeout ']'
      

      This makes seven calls to nvmf-autoconnect.sh = seven connection attempts, seven errors and timeouts observed all over again.

      The above log is from an installed system. Things are even worse when an installer image is booting, making 31 connection attempts in total for this testing environment, which only one is a settled event, the rest are nbftroot hooks. That's about 105 seconds delay in total.

      It is my understanding that network setup is fully finished through NetworkManager before any nvme connection attempts are made. Therefore it appears to me that any calls to nbftroot.sh are unneeded, only the usual initqueue events should be sufficient (i.e. the settled event).

      I also think we don't need to react on NIC link events in the NBFT case since first connection attempts have been already made by UEFI and invalid connection were filtered out (or marked as 'unavailable'). Reconnection of unavailable records is performed shortly after switchroot by NetworkManager hooks. In the initramfs phase we only need to mount rootfs.

      Subject to further discussion.

              tbzatek Tomáš Bžatek
              tbzatek Tomáš Bžatek
              dracut maint mailing list dracut maint mailing list
              RHEL CS Plumbers QE Bot RHEL CS Plumbers QE Bot
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: