Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-76551

Component Readiness: [Machine Config Operator] [OCPFeatureGate:MachineConfigNodes] test regressed

    • None
    • False
    • Hide

      None

      Show
      None
    • 3
    • None
    • None
    • None
    • None
    • MCO Sprint 284
    • 1
    • In Progress
    • Release Note Not Required
    • None
    • None
    • None
    • None
    • None

      Best guess is that a test (could be this one) is applying a machineconfig which is causing a node to not be ready. There are messages about infra configs not being rendered.

      (Feel free to update this bug's summary to be more specific.)
      Component Readiness has found a potential regression in the following test:

      [sig-mco][OCPFeatureGate:MachineConfigNodes] [Serial]Should have MCN properties matching associated node properties for nodes in custom MCPs [apigroup:machineconfiguration.openshift.io] [Suite:openshift/conformance/serial]

      Test has a 55.56% pass rate, but 95.00% is required.

      Sample (being evaluated) Release: 4.22
      Start Time: 2026-02-04T00:00:00Z
      End Time: 2026-02-11T12:00:00Z
      Success Rate: 55.56%
      Successes: 5
      Failures: 4
      Flakes: 0
      Base (historical) Release: 4.21
      Start Time: 2026-01-04T00:00:00Z
      End Time: 2026-02-03T23:59:59Z
      Success Rate: 0.00%
      Successes: 0
      Failures: 0
      Flakes: 0

      View the test details report for additional context.

      Filed by: cmeadors@redhat.com

      AI Analysis

      On RHCOS 10 clusters, when the MachineConfigOperator renders a config for a newly created custom MachineConfigPool (e.g. an "infra" MCP), the rendered config's osImageURL points to the standard el9-based RHCOS image instead of the RHCOS 10 image. This causes the MCD to perform an rpm-ostree rebase that downgrades the node from RHCOS 10 (el10, kernel 6.12) to RHCOS 9 (el9, kernel 5.14). After the downgrade, the node is left in a Degraded state.

      This is RHCOS 10-specific – the same test passes on standard RHCOS 9 clusters.

      Version-Release number of selected component

      4.22 (payload 4.22.0-0.ci-2026-02-11-205844)

      How reproducible

      Always (on RHCOS 10 clusters when a custom MCP is created)

      Steps to Reproduce

      1. Deploy an RHCOS 10 cluster (e.g. using the rhcos10 variant jobs)
      2. Create a custom MachineConfigPool (e.g. "infra")
      3. Label a worker node to move it into the custom MCP
      4. Wait for the MCO to render a config and begin applying it
      5. Observe the MCD initiates an rpm-ostree rebase with osUpdate:true

      Actual results

      The MCD rebases the node to the el9-based RHCOS 9 image instead of the RHCOS 10 image. After reboot, the node is running RHCOS 9.6 (kernel 5.14.0-570.88.1.el9_6.x86_64) instead of RHCOS 10.1 (kernel 6.12.0-124.28.1.el10_1.x86_64). The node is then left in a Degraded state.

      Final node state observed:

      Node OS Kernel State
      ip-10-0-28-48 (master) RHCOS 10.1 6.12.0-124.28.1.el10_1 Done
      ip-10-0-4-154 (master) RHCOS 10.1 6.12.0-124.28.1.el10_1 Done
      ip-10-0-81-27 (master) RHCOS 10.1 6.12.0-124.28.1.el10_1 Done
      ip-10-0-57-195 (worker) RHCOS 10.1 6.12.0-124.28.1.el10_1 Done
      ip-10-0-58-84 (worker) RHCOS 10.1 6.12.0-124.28.1.el10_1 Done
      ip-10-0-91-126 (worker) RHCOS 9.6 5.14.0-570.88.1.el9_6 Degraded

      Expected results

      The rendered config for the custom MCP should use the RHCOS 10 osImageURL, matching the OS already running on the node. The node should remain on RHCOS 10 after the MCP config is applied. No OS version change should occur.

      Analysis

      This was observed in Prow job periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-rhcos10-techpreview-serial-3of3/2021812972535418880.

      The test [sig-mco][OCPFeatureGate:MachineConfigNodes] [Serial]Should have MCN properties matching associated node properties for nodes in custom MCPs triggered the issue.

      Detailed timeline from interval/event data:

      • 06:57:55 - Test creates an infra MachineConfigPool and labels worker node ip-10-0-91-126 with node-role.kubernetes.io/infra
      • 06:58:06 - MCO renders rendered-infra-a6e36fcbf813ac0f103415668f37f8ad and targets it to the node. The rendered config has osUpdate:true, indicating the OS image in the rendered config differs from what is currently running.
      • 06:58:08 - MCD begins the update: Starting update from rendered-worker-bbd8c99fff912953e6e8ccf86d9f67b5 to rendered-infra-a6e36fcbf813ac0f103415668f37f8ad: &{osUpdate:true ...}
      • 07:01:08 - MCD initiates rpm-ostree rebase to registry.ci.openshift.org/ocp/4.22-2026-02-11-205844@sha256:39f40e0f63cf... – this is the wrong (el9-based) image
      • 07:02:15 - MCD reboots the node
      • 07:02:55 - Node comes back running RHCOS 9.6 (el9, kernel 5.14) instead of RHCOS 10.1 (el10, kernel 6.12)
      • 07:03:08 - Test times out after 5 minutes waiting for infra MCP to reach Updated state (0/1 ready machines)
      • Test cleanup removes the infra label and deletes the infra MCP, leaving the node with a dangling desiredConfig reference: machineconfig.machineconfiguration.openshift.io "rendered-infra-a6e36fcbf813ac0f103415668f37f8ad" not found

      Root cause

      When the MCO renders a MachineConfig for a new custom MCP on an RHCOS 10 cluster, the osImageURL in the rendered config points to the standard el9-based RHCOS image rather than the RHCOS 10 image. This triggers osUpdate:true and an rpm-ostree rebase that downgrades the node from RHCOS 10 to RHCOS 9.

      The MCO's rendering logic does not appear to account for RHCOS 10 when creating configs for new/custom MCPs, even though the worker MCP's rendered config correctly uses the RHCOS 10 image.

      Additional info

      • Only affects RHCOS 10 clusters (standard RHCOS 9 clusters are not impacted)
      • The worker MCP rendered config correctly uses the RHCOS 10 image – the bug is specific to newly created custom MCPs
      • After the downgrade + test cleanup, the node is left Degraded with a missing rendered config reference, requiring manual intervention

              rh-ee-ijanssen Isabella Janssen
              openshift-trt OpenShift Technical Release Team
              None
              None
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              None
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated: