Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-60671

In OCL based cluster MCP is getting degreaded to update OS Image

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 3
    • None
    • None
    • None
    • None
    • None
    • MCO Sprint 276
    • 1
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

        For OCL based cluster, when MCP is updating it is getting degreaded with error to update OS -image.

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          

      Steps to Reproduce:

      I dont know exactly how to reproduce this error but was able to see multiple time in CI job
      https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.20-amd64-nightly-gcp-ipi-longduration-tp-mco-p3-f7/1957093328067497984

      https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.20-amd64-nightly-aws-ipi-longduration-mco-critical-f7/1955520016115830784

      While verifying the PR I encountered this below steps to generate the error, but not sure this is exact way to reproduce this.

          1.Apply MOSC with wrong container file
      oc create -f - << EOF
      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineOSConfig
      metadata:
        name: worker
      spec:
        machineConfigPool:
          name: worker
        imageBuilder:
          imageBuilderType: Job
        baseImagePullSecret:
          name: $(oc get -n openshift-machine-config-operator sa builder -ojsonpath='{.secrets[0].name}')
        renderedImagePushSecret:
          name: $(oc get -n openshift-machine-config-operator sa builder -ojsonpath='{.secrets[0].name}')
        renderedImagePushSpec: "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image:latest"
        containerFile:
        - content: |-
            FROM alpine:3.18
            RUN apt update && apt install -y cowsayEOF
      Error from server (AlreadyExists): error when creating "STDIN": machineosconfigs.machineconfiguration.openshift.io "worker" already exists
          2. The MOSB is failed and MCP too but with diffrent error which is expected
          3. Then correct the Containerfile in above MOSC
          4. MOSB is build successful
          5. MCP is degraded with error         

      Actual results:

          Error seen
      
        - lastTransitionTime: "2025-08-20T06:51:17Z"
          message: 'Node ip-10-0-9-181.us-east-2.compute.internal is reporting: "Node ip-10-0-9-181.us-east-2.compute.internal
            upgrade failure. Failed to update OS to image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image@sha256:8b12f9092364afc2b2116f9dcf7cb3f0cffe8753c13ccb73332ae2b88650fcd1
            after retries: timed out waiting for the condition", Node ip-10-0-9-181.us-east-2.compute.internal
            is reporting: "Failed to update OS to image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image@sha256:8b12f9092364afc2b2116f9dcf7cb3f0cffe8753c13ccb73332ae2b88650fcd1
            after retries: timed out waiting for the condition"'
          reason: 1 nodes are reporting degraded status on sync
          status: "True"
          type: NodeDegraded
        - lastTransitionTime: "2025-08-20T06:51:17Z"
          message: 'Node ip-10-0-9-181.us-east-2.compute.internal is reporting: "Node ip-10-0-9-181.us-east-2.compute.internal
            upgrade failure. Failed to update OS to image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image@sha256:8b12f9092364afc2b2116f9dcf7cb3f0cffe8753c13ccb73332ae2b88650fcd1
            after retries: timed out waiting for the condition", Node ip-10-0-9-181.us-east-2.compute.internal
            is reporting: "Failed to update OS to image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image@sha256:8b12f9092364afc2b2116f9dcf7cb3f0cffe8753c13ccb73332ae2b88650fcd1
            after retries: timed out waiting for the condition"'
          reason: ""
          status: "True"
          type: Degraded

      Expected results:

          

      Additional info:

      must-gather: https://drive.google.com/drive/folders/1SwyfNWYHZ-PQECU2l5KE-tnCwU9NRhEt?usp=sharing

              umohnani Urvashi Mohnani
              rh-ee-ptalgulk Prachiti Talgulkar
              None
              None
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              None
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: