-
Bug
-
Resolution: Done-Errata
-
Major
-
4.13
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
This is a clone of issue OCPBUGS-37534. The following is the description of the original issue:
—
Description of problem:
Prow jobs upgrading from 4.9 to 4.16 are failing when they upgrade from 4.12 to 4.13.
Nodes become NotReady when MCO tries to apply the new 4.13 configuration to the MCPs.
The failing job is: periodic-ci-openshift-openshift-tests-private-release-4.16-amd64-nightly-4.16-upgrade-from-stable-4.9-azure-ipi-f28
We have reproduced the issue and we found an ordering cycle error in the journal log
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 systemd-journald.service[838]: Runtime Journal (/run/log/journal/960b04f10e4f44d98453ce5faae27e84) is 8.0M, max 641.9M, 633.9M free.
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Found ordering cycle on network-online.target/start
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Found dependency on node-valid-hostname.service/start
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Found dependency on ovs-configuration.service/start
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Found dependency on firstboot-osupdate.target/start
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Found dependency on machine-config-daemon-firstboot.service/start
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Found dependency on machine-config-daemon-pull.service/start
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: machine-config-daemon-pull.service: Job network-online.target/start deleted to break ordering cycle starting with machine-config-daemon-pull.service/start
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: Queued start job for default target Graphical Interface.
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: systemd-journald.service: unit configures an IP firewall, but the local system does not support BPF/cgroup firewalling.
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: (This warning is only shown for the first unit using IP firewalling.)
Wed 2024-07-24 21:12:17 UTC ci-op-g94jvswm-cc71e-998q8-master-2 init.scope[1]: systemd-journald.service: Deactivated successfully.
Version-Release number of selected component (if applicable):
Using IPI on Azure, these are the version involved in the current issue upgrading from 4.9 to 4.13:
version: 4.13.0-0.nightly-2024-07-23-154444
version: 4.12.0-0.nightly-2024-07-23-230744
version: 4.11.59
version: 4.10.67
version: 4.9.59
How reproducible:
Always
Steps to Reproduce:
1. Upgrade an IPI on Azure cluster from 4.9 to 4.13. Theoretically, upgrading from 4.12 to 4.13 should be enough, but we reproduced it following the whole path.
Actual results:
Nodes become not ready
$ oc get nodes
NAME STATUS ROLES AGE VERSION
ci-op-g94jvswm-cc71e-998q8-master-0 Ready master 6h14m v1.25.16+306a47e
ci-op-g94jvswm-cc71e-998q8-master-1 Ready master 6h13m v1.25.16+306a47e
ci-op-g94jvswm-cc71e-998q8-master-2 NotReady,SchedulingDisabled master 6h13m v1.25.16+306a47e
ci-op-g94jvswm-cc71e-998q8-worker-centralus1-c7ngb NotReady,SchedulingDisabled worker 6h2m v1.25.16+306a47e
ci-op-g94jvswm-cc71e-998q8-worker-centralus2-2ppf6 Ready worker 6h4m v1.25.16+306a47e
ci-op-g94jvswm-cc71e-998q8-worker-centralus3-nqshj Ready worker 6h6m v1.25.16+306a47e
And in the NotReady nodes we can see the ordering cycle error mentioned in the description of this ticket.
Expected results:
No ordering cycle error should happen and the upgrade should be executed without problems.
Additional info:
- clones
-
OCPBUGS-37534 4.12 -> 4.13 upgrade using IPI on Azure does not work
-
- Closed
-
- is blocked by
-
OCPBUGS-37534 4.12 -> 4.13 upgrade using IPI on Azure does not work
-
- Closed
-
- links to
-
RHEA-2024:3718
OpenShift Container Platform 4.17.z bug fix update