-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.22
-
None
-
False
-
-
3
-
None
-
None
-
None
-
None
-
MCO Sprint 284
-
1
-
In Progress
-
Release Note Not Required
-
None
-
None
-
None
-
None
-
None
Best guess is that a test (could be this one) is applying a machineconfig which is causing a node to not be ready. There are messages about infra configs not being rendered.
(Feel free to update this bug's summary to be more specific.)
Component Readiness has found a potential regression in the following test:
[sig-mco][OCPFeatureGate:MachineConfigNodes] [Serial]Should have MCN properties matching associated node properties for nodes in custom MCPs [apigroup:machineconfiguration.openshift.io] [Suite:openshift/conformance/serial]
Test has a 55.56% pass rate, but 95.00% is required.
Sample (being evaluated) Release: 4.22
Start Time: 2026-02-04T00:00:00Z
End Time: 2026-02-11T12:00:00Z
Success Rate: 55.56%
Successes: 5
Failures: 4
Flakes: 0
Base (historical) Release: 4.21
Start Time: 2026-01-04T00:00:00Z
End Time: 2026-02-03T23:59:59Z
Success Rate: 0.00%
Successes: 0
Failures: 0
Flakes: 0
View the test details report for additional context.
Filed by: cmeadors@redhat.com
AI Analysis
On RHCOS 10 clusters, when the MachineConfigOperator renders a config for a newly created custom MachineConfigPool (e.g. an "infra" MCP), the rendered config's osImageURL points to the standard el9-based RHCOS image instead of the RHCOS 10 image. This causes the MCD to perform an rpm-ostree rebase that downgrades the node from RHCOS 10 (el10, kernel 6.12) to RHCOS 9 (el9, kernel 5.14). After the downgrade, the node is left in a Degraded state.
This is RHCOS 10-specific – the same test passes on standard RHCOS 9 clusters.
Version-Release number of selected component
4.22 (payload 4.22.0-0.ci-2026-02-11-205844)
How reproducible
Always (on RHCOS 10 clusters when a custom MCP is created)
Steps to Reproduce
- Deploy an RHCOS 10 cluster (e.g. using the rhcos10 variant jobs)
- Create a custom MachineConfigPool (e.g. "infra")
- Label a worker node to move it into the custom MCP
- Wait for the MCO to render a config and begin applying it
- Observe the MCD initiates an rpm-ostree rebase with osUpdate:true
Actual results
The MCD rebases the node to the el9-based RHCOS 9 image instead of the RHCOS 10 image. After reboot, the node is running RHCOS 9.6 (kernel 5.14.0-570.88.1.el9_6.x86_64) instead of RHCOS 10.1 (kernel 6.12.0-124.28.1.el10_1.x86_64). The node is then left in a Degraded state.
Final node state observed:
| Node | OS | Kernel | State |
|---|---|---|---|
| ip-10-0-28-48 (master) | RHCOS 10.1 | 6.12.0-124.28.1.el10_1 | Done |
| ip-10-0-4-154 (master) | RHCOS 10.1 | 6.12.0-124.28.1.el10_1 | Done |
| ip-10-0-81-27 (master) | RHCOS 10.1 | 6.12.0-124.28.1.el10_1 | Done |
| ip-10-0-57-195 (worker) | RHCOS 10.1 | 6.12.0-124.28.1.el10_1 | Done |
| ip-10-0-58-84 (worker) | RHCOS 10.1 | 6.12.0-124.28.1.el10_1 | Done |
| ip-10-0-91-126 (worker) | RHCOS 9.6 | 5.14.0-570.88.1.el9_6 | Degraded |
Expected results
The rendered config for the custom MCP should use the RHCOS 10 osImageURL, matching the OS already running on the node. The node should remain on RHCOS 10 after the MCP config is applied. No OS version change should occur.
Analysis
This was observed in Prow job periodic-ci-openshift-release-main-ci-4.22-e2e-aws-ovn-rhcos10-techpreview-serial-3of3/2021812972535418880.
The test [sig-mco][OCPFeatureGate:MachineConfigNodes] [Serial]Should have MCN properties matching associated node properties for nodes in custom MCPs triggered the issue.
Detailed timeline from interval/event data:
- 06:57:55 - Test creates an infra MachineConfigPool and labels worker node ip-10-0-91-126 with node-role.kubernetes.io/infra
- 06:58:06 - MCO renders rendered-infra-a6e36fcbf813ac0f103415668f37f8ad and targets it to the node. The rendered config has osUpdate:true, indicating the OS image in the rendered config differs from what is currently running.
- 06:58:08 - MCD begins the update: Starting update from rendered-worker-bbd8c99fff912953e6e8ccf86d9f67b5 to rendered-infra-a6e36fcbf813ac0f103415668f37f8ad: &{osUpdate:true ...}
- 07:01:08 - MCD initiates rpm-ostree rebase to registry.ci.openshift.org/ocp/4.22-2026-02-11-205844@sha256:39f40e0f63cf... – this is the wrong (el9-based) image
- 07:02:15 - MCD reboots the node
- 07:02:55 - Node comes back running RHCOS 9.6 (el9, kernel 5.14) instead of RHCOS 10.1 (el10, kernel 6.12)
- 07:03:08 - Test times out after 5 minutes waiting for infra MCP to reach Updated state (0/1 ready machines)
- Test cleanup removes the infra label and deletes the infra MCP, leaving the node with a dangling desiredConfig reference: machineconfig.machineconfiguration.openshift.io "rendered-infra-a6e36fcbf813ac0f103415668f37f8ad" not found
Root cause
When the MCO renders a MachineConfig for a new custom MCP on an RHCOS 10 cluster, the osImageURL in the rendered config points to the standard el9-based RHCOS image rather than the RHCOS 10 image. This triggers osUpdate:true and an rpm-ostree rebase that downgrades the node from RHCOS 10 to RHCOS 9.
The MCO's rendering logic does not appear to account for RHCOS 10 when creating configs for new/custom MCPs, even though the worker MCP's rendered config correctly uses the RHCOS 10 image.
Additional info
- Only affects RHCOS 10 clusters (standard RHCOS 9 clusters are not impacted)
- The worker MCP rendered config correctly uses the RHCOS 10 image – the bug is specific to newly created custom MCPs
- After the downgrade + test cleanup, the node is left Degraded with a missing rendered config reference, requiring manual intervention
- causes
-
OCPBUGS-77002 Add MCO tests creating custom MCPs back to RHEL10 test suite
-
- New
-
- is related to
-
OCPBUGS-76948 Component Readiness: [Node / CRI-O] [OCPFeatureGate:SigstoreImageVerificationPKI] test regressed
-
- Closed
-
- links to