Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.17
Component/s: RHCOS
Labels:
- osintegration

Severity:
Moderate
Regression:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

I am not sure if this issue belongs to MCO or not. Please, feel free to reassign it to the right Component if MCO is not the right one.

The CI job for vsphere-ipi-ovn-dualstack-privmaryv6 upgrading from 4.15 -> 4.17 is failing because some nodes cannot join the cluster after reboot.

https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-4.17-upgrade-from-stable-4.15-vsphere-ipi-ovn-dualstack-privmaryv6-f28/1843678815767760896



We can see the following information in the failed node (in a rehearse job created to extract the debug information that the main job cannot provide):

core@ci-op-mzcwnlzz-661ff-997hj-worker-0-vmvq4 ~]$ systemctl --failed
  UNIT                              LOAD   ACTIVE SUB    DESCRIPTION                                    
● systemd-network-generator.service loaded failed failed Generate network units from Kernel command line

$ journalctl -u systemd-network-generator.service
Oct 11 07:59:22 88-110-38-10.in-addr.arpa systemd[1]: Finished Generate network units from Kernel command line.
Oct 11 08:01:03 ci-op-mzcwnlzz-661ff-997hj-worker-0-vmvq4 systemd[1]: systemd-network-generator.service: Deactivated successfully.
Oct 11 08:01:03 ci-op-mzcwnlzz-661ff-997hj-worker-0-vmvq4 systemd[1]: Stopped Generate network units from Kernel command line.
-- Boot 7f937ffe8dd741a29d753994ae03b187 --
Oct 11 08:01:16 ci-op-mzcwnlzz-661ff-997hj-worker-0-vmvq4 systemd-network-generator[799]: Failed to parse kernel command line: Invalid argument
Oct 11 08:01:16 ci-op-mzcwnlzz-661ff-997hj-worker-0-vmvq4 systemd[1]: systemd-network-generator.service: Main process exited, code=exited, status=1/FAILURE
Oct 11 08:01:16 ci-op-mzcwnlzz-661ff-997hj-worker-0-vmvq4 systemd[1]: systemd-network-generator.service: Failed with result 'exit-code'.
Oct 11 08:01:16 ci-op-mzcwnlzz-661ff-997hj-worker-0-vmvq4 systemd[1]: Failed to start Generate network units from Kernel command line.
-- Boot c253b3e5b5f1454392a1f8c663305f12 --
Oct 11 10:49:15 ci-op-mzcwnlzz-661ff-997hj-worker-0-vmvq4 systemd-network-generator[808]: Failed to parse kernel command line: Invalid argument
Oct 11 10:49:15 ci-op-mzcwnlzz-661ff-997hj-worker-0-vmvq4 systemd[1]: systemd-network-generator.service: Main process exited, code=exited, status=1/FAILURE
Oct 11 10:49:15 ci-op-mzcwnlzz-661ff-997hj-worker-0-vmvq4 systemd[1]: systemd-network-generator.service: Failed with result 'exit-code'.
Oct 11 10:49:15 ci-op-mzcwnlzz-661ff-997hj-worker-0-vmvq4 systemd[1]: Failed to start Generate network units from Kernel command line.
-- Boot 4ef77d9b467942ef892d2145f8b8fa44 --
Oct 11 11:25:03 ci-op-mzcwnlzz-661ff-997hj-worker-0-vmvq4 systemd-network-generator[805]: Failed to parse kernel command line: Invalid argument
Oct 11 11:25:03 ci-op-mzcwnlzz-661ff-997hj-worker-0-vmvq4 systemd[1]: systemd-network-generator.service: Main process exited, code=exited, status=1/FAILURE
Oct 11 11:25:03 ci-op-mzcwnlzz-661ff-997hj-worker-0-vmvq4 systemd[1]: systemd-network-generator.service: Failed with result 'exit-code'.
Oct 11 11:25:03 ci-op-mzcwnlzz-661ff-997hj-worker-0-vmvq4 systemd[1]: Failed to start Generate network units from Kernel command line.
-- Boot 23e84d36ee4846708b91213de129c437 --
Oct 11 11:58:53 ci-op-mzcwnlzz-661ff-997hj-worker-0-vmvq4 systemd-network-generator[813]: Failed to parse kernel command line: Invalid argument
Oct 11 11:58:53 ci-op-mzcwnlzz-661ff-997hj-worker-0-vmvq4 systemd[1]: systemd-network-generator.service: Main process exited, code=exited, status=1/FAILURE
Oct 11 11:58:53 ci-op-mzcwnlzz-661ff-997hj-worker-0-vmvq4 systemd[1]: systemd-network-generator.service: Failed with result 'exit-code'.


$ cat /proc/cmdline 
BOOT_IMAGE=(hd0,gpt3)/ostree/rhcos-6462af9eefed4c54bc8ce5f08af127a24f7bc0b5c816874344fb5ad6b934b54d/vmlinuz-5.14.0-427.40.1.el9_4.x86_64 ostree=/ostree/boot.1/rhcos/6462af9eefed4c54bc8ce5f08af127a24f7bc0b5c816874344fb5ad6b934b54d/0 ignition.platform.id=vmware console=ttyS0,115200n8 console=tty0 root=UUID=45ad3301-cf30-45b9-9917-5fcd00e9944b rw rootflags=prjquota boot=UUID=5dd5450f-5543-4865-bd25-94fecb2a18b2 systemd.unified_cgroup_hierarchy=1 cgroup_no_v1=all psi=0 ip=dhcp,dhcp6

Version-Release number of selected component (if applicable):

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.16.0-0.nightly-2024-10-10-193220   True        False         169m    Error while reconciling 4.16.0-0.nightly-2024-10-10-193220: the cluster operator machine-config is degraded

How reproducible:

In the original prow job it happens consistently (sometimes in the vsphere-ipi-ovn-dualstack-privmaryv6-f28-cucushift-chainupgrade-toimage step and other times in the vsphere-ipi-ovn-dualstack-privmaryv6-f28-openshift-extended-upgrade-pre-custom-cli step or in the vsphere-ipi-ovn-dualstack-privmaryv6-f28-cucushift-upgrade-prehealthcheck step).

If we use a rehearse job to reproduce it we need a bit of luck to hit the problem, but eventually we hit it.

Steps to Reproduce:

    1. Run CI job: https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/job-history/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.17-amd64-nightly-4.17-upgrade-from-stable-4.15-vsphere-ipi-ovn-dualstack-privmaryv6-f28

Actual results:

    A node cannot join the cluster after reboot. When we login to the failed node the systemd-network-generator.service is failing.

Expected results:

    All nodes should be able to join the cluster after they are rebooted.

Additional info:

    In the first comment we posted the links to the must-gather file, the journal log and the ssosreport

Assignee:: Unassigned

Reporter:: Sergio Regidor de la Rosa

QA Contact:: Sergio Regidor de la Rosa

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/10/11 2:02 PM

Updated:: 2024/11/07 4:29 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates