-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.18
-
Important
-
No
-
3
-
MCO Sprint 266, MCO Sprint 267, MCO Sprint 268, MCO Sprint 269
-
4
-
Rejected
-
False
-
Description of problem:
In a cluster using a proxy, when we enable OCL in a MCP and then we disable it, the result is that nodes cannot join the cluster after the reboot.
Version-Release number of selected component (if applicable):
4.18.0-0.nightly-arm64-2025-02-05-055102
How reproducible:
Always
Steps to Reproduce:
1. Create a MOSC resource to enable OCL in the worker pool 2. Wait for the osImage to be created and applied to all worker nodes. 3. Remove the MOSC resource to disable OCL
Actual results:
Nodes are rebooted, and after the reboot they cannot join the cluster and remain in NotReady status $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-50-26.ec2.internal Ready control-plane,master 3h20m v1.31.4 ip-10-0-54-149.ec2.internal NotReady,SchedulingDisabled worker 3h11m v1.31.4 ip-10-0-75-192.ec2.internal Ready worker 3h11m v1.31.4 ip-10-0-78-138.ec2.internal Ready control-plane,master 3h20m v1.31.4 ip-10-0-88-130.ec2.internal Ready control-plane,master 3h20m v1.31.4 If we access the broken nodes via ssh, we can see that the machine-config-daemon-revert.service service is failing. [core@ip-10-0-49-2 ~]$ journalctl -u machine-config-daemon-revert.service Feb 05 14:53:42 ip-10-0-49-2 systemd[1]: machine-config-daemon-revert.service: Failed to load environment files: No such file or directory Feb 05 14:53:42 ip-10-0-49-2 systemd[1]: machine-config-daemon-revert.service: Failed to run 'start-pre' task: No such file or directory Feb 05 14:53:42 ip-10-0-49-2 systemd[1]: machine-config-daemon-revert.service: Failed with result 'resources'. Feb 05 14:53:42 ip-10-0-49-2 systemd[1]: Failed to start Machine Config Daemon Revert. [core@ip-10-0-49-2 ~]$ ls /etc/mco/ The reason is that the file /etc/mco/proxy.env file does not exist in the node. [core@ip-10-0-49-2 ~]$ ls /etc/mco/ internal-registry-pull-secret.json machineconfig-revert.json
Expected results:
When we remove the MOSC resource, OCL should be disabled and the nodes should stop using the layered image without problems.
Additional info:
We don't see this issue happening in clusters that do not use a proxy configuration. The issue is causing this job to fail https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/job-history/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.18-arm64-nightly-aws-ipi-longrun-mco-proxy-fips-p1-f28
- links to
-
RHEA-2024:11038 OpenShift Container Platform 4.19.z bug fix update