Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-10325

[gcp] intermittent NM timeout in initramfs

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Can't Do
    • Icon: Critical Critical
    • None
    • 4.13.0
    • RHCOS
    • No
    • Sprint 233 - Update&Remoting
    • 1
    • Approved
    • False
    • Hide

      None

      Show
      None

      Since the 9.2 merge, we've seen some jobs fail bringing up control plane nodes; the root cause seems to be some sort of intermittent race.

       

      This is a pretty critical bug because it will manifest in flakes all over CI, and of course for customer installs.

       

      In this aggregated job, take a look at this console log :

        OK  ] Reached target System Initialization.
      [    8.062341] systemd[1]: Reached target Basic System.
      [  OK  ] Reached target Basic System.
      ...
      [    8.312057] ignition[789]: Ignition 2.14.0
      [    8.312869] ignition[789]: Stage: fetch-offline
      [    8.314074] ignition[789]: reading system config file "/usr/lib/ignition/base.d/00-core.ign"
      [    8.315679] ignition[789]: no config dir at "/usr/lib/ignition/base.platform.d/gcp"
      [[    8.317072] systemd[1]: Started Ignition (fetch-offline).
        OK      8.318130] ignition[789]: no config URL provided
      m] [    8.319497] ignition[789]: reading system config file "/usr/lib/ignition/user.ign"
      Started Ignition[    8.320980] ignition[789]: no config at "/usr/lib/ignition/user.ign"
       (fetch-offline)[    8.322690] ignition[789]: failed to fetch config: resource requires networking
      .[    8.324281] ignition[789]: Ignition finished successfully
      ...
               Starting CoreOS [    8.330258] systemd[1]: Starting CoreOS Enable Network...
      Enable Network...
      [    8.560359] systemd[1]: Started CoreOS Enable Network.
      [  OK  ] Started CoreOS Enable Network.
               [    8.566643] systemd[1]: Starting Copy CoreOS Firstboot Networking Config...
      Starting Copy CoreOS Firstboot Networking Config...
               Starting Ignitio[    8.573609] systemd[1]: Starting Ignition (fetch)...
      n (fetch)...
      [    8.593377] EXT4-fs (sda3): mounted filesystem with ordered data mode. Opts: (null)
      [    8.600613] coreos-copy-firstboot-network[813]: info: no files to copy from /mnt/boot_partition/coreos-firstboot-network; skipping
      [  OK      8.607229] systemd[1]: Started Copy CoreOS Firstboot Networking Config.
      0m] Started Copy CoreOS Firstboot Networking Config.
               Starting dracut initque[    8.614162] ignition[816]: Ignition 2.14.0
      ue hook...
      [    8.615065] systemd[1]: Starting dracut initqueue hook...
      [    8.616017] ignition[816]: Stage: fetch
      [    8.616879] ignition[816]: reading system config file "/usr/lib/ignition/base.d/00-core.ign"
      [    8.619374] ignition[816]: no config dir at "/usr/lib/ignition/base.platform.d/gcp"
      [    8.621187] ignition[816]: no config URL provided
      [    8.622293] ignition[816]: reading system config file "/usr/lib/ignition/user.ign"
      [    8.624825] ignition[816]: no config at "/usr/lib/ignition/user.ign"
      [    8.626288] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #1
      [    8.628861] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [    8.819648] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #2
      [    8.821659] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [    9.021300] NetworkManager[842]: <info>  [1678822681.0846] NetworkManager (version 1.36.0-12.el8_6) is starting... (for the first time)
      [    9.024892] NetworkManager[842]: <info>  [1678822681.0847] Read config: /etc/NetworkManager/NetworkManager.conf
      [    9.026661] NetworkManager[842]: <info>  [1678822681.0869] auth[0x562982c3a2a0]: create auth-manager: D-Bus connection not available. Polkit is disabled and only root will be authorized.
      [    9.029495] NetworkManager[842]: <info>  [1678822681.0886] manager[0x562982c6f020]: monitoring kernel firmware directory '/lib/firmware'.
      [    9.030969] IPv6: ADDRCONF(NETDEV_UP): ens4: link is not ready
      [    9.032095] NetworkManager[842]: <info>  [1678822681.0888] hostname: hostname: hostnamed not used as proxy creation failed with: Could not connect: No such file or directory
      [    9.035715] NetworkManager[842]: <info>  [1678822681.0890] dns-mgr[0x562982c66120]: init: dns=default,systemd-resolved rc-manager=symlink
      [    9.037935] NetworkManager[842]: <info>  [1678822681.0890] policy: set-hostname: set hostname to 'localhost.localdomain' (no hostname found)
      [    9.040488] NetworkManager[842]: <info>  [1678822681.0921] Loaded device plugin: NMTeamFactory (/usr/lib64/NetworkManager/1.36.0-12.el8_6/libnm-device-plugin-team.so)
      [    9.044079] NetworkManager[842]: <info>  [1678822681.0922] manager: rfkill: Wi-Fi enabled by radio killswitch; enabled by state file
      [    9.046290] NetworkManager[842]: <info>  [1678822681.0922] manager: rfkill: WWAN enabled by radio killswitch; enabled by state file
      [    9.048802] NetworkManager[842]: <info>  [1678822681.0922] manager: Networking is enabled by state file
      [    9.050602] NetworkManager[842]: <warn>  [1678822681.0926] ifcfg-rh: dbus: don't use D-Bus for com.redhat.ifcfgrh1 service
      [    9.053141] NetworkManager[842]: <info>  [1678822681.0927] settings: Loaded settings plugin: ifcfg-rh ("/usr/lib64/NetworkManager/1.36.0-12.el8_6/libnm-settings-plugin-ifcfg-rh.so")
      [    9.056089] NetworkManager[842]: <info>  [1678822681.0927] settings: Loaded settings plugin: keyfile (internal)
      [    9.057864] NetworkManager[842]: <info>  [1678822681.0940] dhcp-init: Using DHCP client 'internal'
      [    9.059631] NetworkManager[842]: <info>  [1678822681.0940] device (lo): carrier: link connected
      [    9.061566] NetworkManager[842]: <info>  [1678822681.0941] manager: (lo): new Generic device (/org/freedesktop/NetworkManager/Devices/1)
      [    9.063680] NetworkManager[842]: <info>  [1678822681.0945] manager: (ens4): new Ethernet device (/org/freedesktop/NetworkManager/Devices/2)
      [    9.065781] NetworkManager[842]: <info>  [1678822681.0945] device (ens4): state change: unmanaged -> unavailable (reason 'managed', sys-iface-state: 'external')
      [    9.068830] NetworkManager[842]: <info>  [1678822681.0986] device (ens4): carrier: link connected
      [    9.070542] NetworkManager[842]: <warn>  [1678822681.0992] sleep-monitor-sd: failed to acquire D-Bus proxy: Could not connect: No such file or directory
      [    9.073473] NetworkManager[842]: <info>  [1678822681.0993] device (ens4): state change: unavailable -> disconnected (reason 'none', sys-iface-state: 'managed')
      [    9.076415] NetworkManager[842]: <info>  [1678822681.0995] policy: auto-activating connection 'Wired Connection' (5077b9d2-9d45-4c92-adfc-81e60947e5be)
      [    9.079122] NetworkManager[842]: <info>  [1678822681.0998] device (ens4): Activation: starting connection 'Wired Connection' (5077b9d2-9d45-4c92-adfc-81e60947e5be)
      [    9.081635] NetworkManager[842]: <info>  [1678822681.0999] device (ens4): state change: disconnected -> prepare (reason 'none', sys-iface-state: 'managed')
      [    9.084716] NetworkManager[842]: <info>  [1678822681.0999] manager: NetworkManager state is now CONNECTING
      [    9.086696] NetworkManager[842]: <info>  [1678822681.0999] device (ens4): state change: prepare -> config (reason 'none', sys-iface-state: 'managed')
      [    9.090627] NetworkManager[842]: <info>  [1678822681.1002] device (ens4): state change: config -> ip-config (reason 'none', sys-iface-state: 'managed')
      [    9.094902] NetworkManager[842]: <info>  [1678822681.1004] dhcp4 (ens4): activation: beginning transaction (timeout in 90 seconds)
      [    9.220439] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #3
      [    9.222722] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [   10.021092] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #4
      [   10.024811] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [   11.621883] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #5
      [   11.624764] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [   14.824475] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #6
      [   14.826467] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [   19.825129] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #7
      [   19.827691] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [   24.830414] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #8
      [   24.841491] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [   29.831270] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #9
      [   29.842424] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [   34.835971] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #10
      [   34.846923] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [   39.836597] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #11
      [   39.847754] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [   44.841306] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #12
      [   44.852139] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [   49.842697] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #13
      [   49.853954] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [   54.847479] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #14
      [   54.858231] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [   59.848798] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #15
      [   59.859589] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [   64.853158] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #16
      [   64.863946] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [   69.854547] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #17
      [   69.865597] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [   74.859178] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #18
      [   74.870231] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [   79.859997] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #19
      [   79.870775] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [   84.864265] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #20
      [   84.875176] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [   89.865856] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #21
      [   89.877574] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [   94.871311] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #22
      [   94.882341] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [   99.015824] NetworkManager[842]: <info>  [1678822771.0791] device (ens4): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed')
      [   99.031548] NetworkManager[842]: <info>  [1678822771.0793] manager: NetworkManager state is now DISCONNECTED
      [   99.041891] NetworkManager[842]: <warn>  [1678822771.0793] device (ens4): Activation: failed for connection 'Wired Connection'
      [   99.053770] NetworkManager[842]: <info>  [1678822771.0795] device (ens4): state change: failed -> disconnected (reason 'none', sys-iface-state: 'managed')
      [   99.070233] NetworkManager[842]: <info>  [1678822771.1339] dhcp4 (ens4): canceled DHCP transaction
      [   99.079627] NetworkManager[842]: <info>  [1678822771.1346] manager: startup complete
      [  OK     99.087692] NetworkManager[842]: <info>  [1678822771.1346] quitting now that startup is complete
      0m] Started drac[   99.098946] NetworkManager[842]: <info>  [1678822771.1351] exiting (success)
      ut initqueue hoo[   99.108172] systemd[1]: Started dracut initqueue hook.
      k.
               Startin[   99.118871] systemd[1]: Starting dracut pre-mount hook...
      g dracut pre-mount hook...
      [   99.130366] systemd[1]: Reached target Remote File Systems (Pre).
      [  OK  ] Reached target Remote File Systems (Pre).
      [  OK     99.144425] systemd[1]: Reached target Remote File Systems.
      0m] Reached target Remote File Systems.
      [   99.158552] systemd[1]: Started dracut pre-mount hook.
      [  OK  ] Started dracut pre-mount hook.
      [   99.872095] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #23
      [   99.882859] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [**    ] A start job is running for Ignition (fetch) (1min 40s / no limit)[  104.876391] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #24
      [  104.887310] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [ ***  ] A start job is running for Ignition (fetch) (1min 45s / no limit)[  109.877939] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #25
      [  109.888802] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [     *] A start job is running for Ignition (fetch) (1min 50s / no limit)[  114.882383] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #26
      [  114.893308] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [ ***  ] A start job is running for Ignition (fetch) (1min 55s / no limit)[  119.883198] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #27
      [  119.894623] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [**    ] A start job is running for Ignition (fetch) (2min / no limit)[  124.888231] ignition[816]: GET http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data: attempt #28
      [  124.899715] ignition[816]: GET error: Get "http://169.254.169.254/computeMetadata/v1/instance/attributes/user-data": dial tcp 169.254.169.254:80: connect: network is unreachable
      [    **] A start job is running for Ignition (fetch) (2min 4s / no limit)[  128.619392] ignition[816]: failed to fetch config: unable to fetch resource in time
      [  128.627747] ignition[816]: failed to acquire config: unable to fetch resource in time
      [FAILED[  128.637094] systemd[1]: ignition-fetch.service: Main process exited, code=exited, status=1/FAILURE
      ] Failed to [  128.647508] ignition[816]: Ignition failed: unable to fetch resource in time
      start Ignition ([  128.656327] systemd[1]: ignition-fetch.service: Failed with result 'exit-code'.
      fetch).
       

       

      I thought that in RHEL9 we'd got NM to retry indefinitely in the initramfs?

            walters@redhat.com Colin Walters
            walters@redhat.com Colin Walters
            Michael Nguyen Michael Nguyen
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: