-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
4.11
-
None
-
Moderate
-
None
-
False
-
Description of problem:
This problem was first discovered with assisted installer CI (slack discussion).
Upon boot, CRIO failed to bind to an IPv6 address:
[root@ostest-extraworker-0 ~]# systemctl status crio ● crio.service - Container Runtime Interface for OCI (CRI-O) Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; vendor preset: disabled) Drop-In: /etc/systemd/system/crio.service.d └─10-mco-default-env.conf, 10-mco-default-madv.conf, 10-mco-profile-unix-socket.conf, 20-nodenet.conf Active: failed (Result: exit-code) since Tue 2022-08-23 10:25:59 UTC; 7min ago Docs: https://github.com/cri-o/cri-o Process: 2415 ExecStart=/usr/bin/crio $CRIO_CONFIG_OPTIONS $CRIO_RUNTIME_OPTIONS $CRIO_STORAGE_OPTIONS $CRIO_NETWORK_OPTIONS $CRIO_METRICS_OPTIONS (code=exited, status=1/FAILURE) Main PID: 2415 (code=exited, status=1/FAILURE) CPU: 90ms Aug 23 10:25:59 ostest-extraworker-0 crio[2415]: time="2022-08-23 10:25:59.602701981Z" level=info msg="AppArmor is disabled by the system or at CRI-O build-time" Aug 23 10:25:59 ostest-extraworker-0 crio[2415]: time="2022-08-23 10:25:59.602708133Z" level=info msg="No blockio config file specified, blockio not configured" Aug 23 10:25:59 ostest-extraworker-0 crio[2415]: time="2022-08-23 10:25:59.602713412Z" level=info msg="RDT not available in the host system" Aug 23 10:25:59 ostest-extraworker-0 crio[2415]: time="2022-08-23 10:25:59.602961157Z" level=info msg="Updated default CNI network name to " Aug 23 10:25:59 ostest-extraworker-0 crio[2415]: time="2022-08-23 10:25:59.634908836Z" level=warning msg="Error encountered when checking whether cri-o should wipe images: version file /var/lib/crio/version not found: open /var/lib/crio/> Aug 23 10:25:59 ostest-extraworker-0 crio[2415]: time="2022-08-23 10:25:59.635765832Z" level=fatal msg="Failed to start streaming server: listen tcp [fd2e:6f44:5dd8:c956::1a]:10010: bind: cannot assign requested address" Aug 23 10:25:59 ostest-extraworker-0 systemd[1]: crio.service: Main process exited, code=exited, status=1/FAILURE Aug 23 10:25:59 ostest-extraworker-0 systemd[1]: crio.service: Failed with result 'exit-code'. Aug 23 10:25:59 ostest-extraworker-0 systemd[1]: Failed to start Container Runtime Interface for OCI (CRI-O). Aug 23 10:25:59 ostest-extraworker-0 systemd[1]: crio.service: Consumed 90ms CPU time
Yet, it did recover when I restarted the service via systemd.
Looking closer at that machine, I noticed that this NetworkManager is also failing:
[systemd] Failed Units: 1 NetworkManager-wait-online.service
journalctl log indicates that there were issues activating the NIC, which probably failed CRIO to bind since there was no IPv6 address allocated at the time of the bind attempt:
Aug 23 10:43:10 ostest-extraworker-0 NetworkManager[1368]: <info> [1661251390.9762] device (enp1s0): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed') Aug 23 10:43:10 ostest-extraworker-0 NetworkManager[1368]: <warn> [1661251390.9765] device (enp1s0): Activation: failed for connection 'Wired connection 1' Aug 23 10:43:10 ostest-extraworker-0 NetworkManager[1368]: <info> [1661251390.9767] device (enp1s0): state change: failed -> disconnected (reason 'none', sys-iface-state: 'managed') Aug 23 10:43:11 ostest-extraworker-0 NetworkManager[1368]: <info> [1661251391.0055] dhcp4 (enp1s0): canceled DHCP transaction
Version-Release number of selected component (if applicable):
RHCOS + OCP 4.11
How reproducible:
100% --> see those CI runs
Steps to Reproduce:
1. Reproduced via CI as mentioned above. The failing node is the spoke cluster VM, which is configured to IPv6 with DHCP.
Actual results:
As mentioned above.
Expected results:
NetworkManager to operate appropriately and DHCP to work.
Additional info:
Does not reproduce with OCP 4.10: see CI results for this PR.
This is a disconnected env, so it has no SOS report deployed. Instead, we captured the following: