-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.20
-
Quality / Stability / Reliability
-
False
-
-
3
-
None
-
None
-
None
-
None
-
None
-
MCO Sprint 276
-
1
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
For OCL based cluster, when MCP is updating it is getting degreaded with error to update OS -image.
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
I dont know exactly how to reproduce this error but was able to see multiple time in CI job
https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.20-amd64-nightly-gcp-ipi-longduration-tp-mco-p3-f7/1957093328067497984
While verifying the PR I encountered this below steps to generate the error, but not sure this is exact way to reproduce this.
1.Apply MOSC with wrong container file oc create -f - << EOF apiVersion: machineconfiguration.openshift.io/v1 kind: MachineOSConfig metadata: name: worker spec: machineConfigPool: name: worker imageBuilder: imageBuilderType: Job baseImagePullSecret: name: $(oc get -n openshift-machine-config-operator sa builder -ojsonpath='{.secrets[0].name}') renderedImagePushSecret: name: $(oc get -n openshift-machine-config-operator sa builder -ojsonpath='{.secrets[0].name}') renderedImagePushSpec: "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image:latest" containerFile: - content: |- FROM alpine:3.18 RUN apt update && apt install -y cowsayEOF Error from server (AlreadyExists): error when creating "STDIN": machineosconfigs.machineconfiguration.openshift.io "worker" already exists 2. The MOSB is failed and MCP too but with diffrent error which is expected 3. Then correct the Containerfile in above MOSC 4. MOSB is build successful 5. MCP is degraded with error
Actual results:
Error seen - lastTransitionTime: "2025-08-20T06:51:17Z" message: 'Node ip-10-0-9-181.us-east-2.compute.internal is reporting: "Node ip-10-0-9-181.us-east-2.compute.internal upgrade failure. Failed to update OS to image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image@sha256:8b12f9092364afc2b2116f9dcf7cb3f0cffe8753c13ccb73332ae2b88650fcd1 after retries: timed out waiting for the condition", Node ip-10-0-9-181.us-east-2.compute.internal is reporting: "Failed to update OS to image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image@sha256:8b12f9092364afc2b2116f9dcf7cb3f0cffe8753c13ccb73332ae2b88650fcd1 after retries: timed out waiting for the condition"' reason: 1 nodes are reporting degraded status on sync status: "True" type: NodeDegraded - lastTransitionTime: "2025-08-20T06:51:17Z" message: 'Node ip-10-0-9-181.us-east-2.compute.internal is reporting: "Node ip-10-0-9-181.us-east-2.compute.internal upgrade failure. Failed to update OS to image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image@sha256:8b12f9092364afc2b2116f9dcf7cb3f0cffe8753c13ccb73332ae2b88650fcd1 after retries: timed out waiting for the condition", Node ip-10-0-9-181.us-east-2.compute.internal is reporting: "Failed to update OS to image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image@sha256:8b12f9092364afc2b2116f9dcf7cb3f0cffe8753c13ccb73332ae2b88650fcd1 after retries: timed out waiting for the condition"' reason: "" status: "True" type: Degraded
Expected results:
Additional info:
must-gather: https://drive.google.com/drive/folders/1SwyfNWYHZ-PQECU2l5KE-tnCwU9NRhEt?usp=sharing