Loading...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.11
Component/s: RHCOS
Labels:
None

Severity:
Moderate
Regression:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:
This problem was first discovered with assisted installer CI (slack discussion).
Upon boot, CRIO failed to bind to an IPv6 address:

[root@ostest-extraworker-0 ~]# systemctl status crio
● crio.service - Container Runtime Interface for OCI (CRI-O)
   Loaded: loaded (/usr/lib/systemd/system/crio.service; disabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/crio.service.d
           └─10-mco-default-env.conf, 10-mco-default-madv.conf, 10-mco-profile-unix-socket.conf, 20-nodenet.conf
   Active: failed (Result: exit-code) since Tue 2022-08-23 10:25:59 UTC; 7min ago
     Docs: https://github.com/cri-o/cri-o
  Process: 2415 ExecStart=/usr/bin/crio $CRIO_CONFIG_OPTIONS $CRIO_RUNTIME_OPTIONS $CRIO_STORAGE_OPTIONS $CRIO_NETWORK_OPTIONS $CRIO_METRICS_OPTIONS (code=exited, status=1/FAILURE)
 Main PID: 2415 (code=exited, status=1/FAILURE)
      CPU: 90ms

Aug 23 10:25:59 ostest-extraworker-0 crio[2415]: time="2022-08-23 10:25:59.602701981Z" level=info msg="AppArmor is disabled by the system or at CRI-O build-time"
Aug 23 10:25:59 ostest-extraworker-0 crio[2415]: time="2022-08-23 10:25:59.602708133Z" level=info msg="No blockio config file specified, blockio not configured"
Aug 23 10:25:59 ostest-extraworker-0 crio[2415]: time="2022-08-23 10:25:59.602713412Z" level=info msg="RDT not available in the host system"
Aug 23 10:25:59 ostest-extraworker-0 crio[2415]: time="2022-08-23 10:25:59.602961157Z" level=info msg="Updated default CNI network name to "
Aug 23 10:25:59 ostest-extraworker-0 crio[2415]: time="2022-08-23 10:25:59.634908836Z" level=warning msg="Error encountered when checking whether cri-o should wipe images: version file /var/lib/crio/version not found: open /var/lib/crio/>
Aug 23 10:25:59 ostest-extraworker-0 crio[2415]: time="2022-08-23 10:25:59.635765832Z" level=fatal msg="Failed to start streaming server: listen tcp [fd2e:6f44:5dd8:c956::1a]:10010: bind: cannot assign requested address"
Aug 23 10:25:59 ostest-extraworker-0 systemd[1]: crio.service: Main process exited, code=exited, status=1/FAILURE
Aug 23 10:25:59 ostest-extraworker-0 systemd[1]: crio.service: Failed with result 'exit-code'.
Aug 23 10:25:59 ostest-extraworker-0 systemd[1]: Failed to start Container Runtime Interface for OCI (CRI-O).
Aug 23 10:25:59 ostest-extraworker-0 systemd[1]: crio.service: Consumed 90ms CPU time

Yet, it did recover when I restarted the service via systemd.

Looking closer at that machine, I noticed that this NetworkManager is also failing:

[systemd]
Failed Units: 1
  NetworkManager-wait-online.service

journalctl log indicates that there were issues activating the NIC, which probably failed CRIO to bind since there was no IPv6 address allocated at the time of the bind attempt:

Aug 23 10:43:10 ostest-extraworker-0 NetworkManager[1368]: <info>  [1661251390.9762] device (enp1s0): state change: ip-config -> failed (reason 'ip-config-unavailable', sys-iface-state: 'managed')
Aug 23 10:43:10 ostest-extraworker-0 NetworkManager[1368]: <warn>  [1661251390.9765] device (enp1s0): Activation: failed for connection 'Wired connection 1'
Aug 23 10:43:10 ostest-extraworker-0 NetworkManager[1368]: <info>  [1661251390.9767] device (enp1s0): state change: failed -> disconnected (reason 'none', sys-iface-state: 'managed')
Aug 23 10:43:11 ostest-extraworker-0 NetworkManager[1368]: <info>  [1661251391.0055] dhcp4 (enp1s0): canceled DHCP transaction

Version-Release number of selected component (if applicable):
RHCOS + OCP 4.11

How reproducible:
100% --> see those CI runs

Steps to Reproduce:
1. Reproduced via CI as mentioned above. The failing node is the spoke cluster VM, which is configured to IPv6 with DHCP.

Actual results:
As mentioned above.

Expected results:
NetworkManager to operate appropriately and DHCP to work.

Additional info:
Does not reproduce with OCP 4.10: see CI results for this PR.

This is a disconnected env, so it has no SOS report deployed. Instead, we captured the following:

demsg log
journalctl log
ps -ef output
ip a
netstat -tnlpu

Assignee:: Unassigned

Reporter:: Nir Magnezi (Inactive)

QA Contact:: Michael Nguyen

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2022/08/24 9:26 AM

Updated:: 2022/09/01 4:27 PM

Resolved:: 2022/09/01 4:27 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates