-
Bug
-
Resolution: Unresolved
-
Major
-
4.20.0
-
Quality / Stability / Reliability
-
False
-
-
3
-
Important
-
None
-
None
-
Proposed
-
MCO Sprint 273, MCO Sprint 274
-
2
-
In Progress
-
Release Note Not Required
-
None
-
None
-
None
-
None
-
None
Description of problem:
When we create MC configuring kernel arguments or new extensions the osbuilder pod creates new MOSBs when it it is drained, resulting in the previous MOSB being removed, the image being deleted and the MCP being degraded because it can't find the original osImage manifest.
Version-Release number of selected component (if applicable):
$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.20.0-0.nightly-2025-06-29-224400 True False 5h14m Error while reconciling 4.20.0-0.nightly-2025-06-29-224400: an unknown error has occurred: MultipleErrors
How reproducible:
Always
Steps to Reproduce:
Two ways of reproducing it. One way of reproducing it: 1. Enable Image Mode in the worker pool 2. Wait until the MOSB is created and the builder pod is running 3. Create a MC configuring kernelargs apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: test-kernel-arguments-32 spec: config: ignition: version: 3.2.0 kernelArguments: - test Another way of reproducing it: 1. Enable Image Mode in the worker pool 2. Wait until the osImage is built and applied to all workers 3. Create a MC deploying a new extension apiVersion: machineconfiguration.openshift.io/v1 kind: MachineConfig metadata: labels: machineconfiguration.openshift.io/role: worker name: tc-56131-all-extensions spec: config: ignition: version: 3.1.0 extensions: - usbguard
Actual results:
The result is that the osBuilder pod generates a MOSB, and once this MOSB finishes building the osImage, when the osBuilder pod is drained it generates another MOSB too and removes the first one. Since the first MOSB is removed, its image is garbage collected and removed from the registry, hence the MCP will fail to apply to the nodes and will report a degraded status.
Expected results:
No degradation should happen in those scenarios.
Additional info:
We can see this happening in the regression tests too: https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.20-amd64-nightly-vsphere-ipi-longduration-mco-fips-proxy-g2-f14/1939185613580275712 https://qe-private-deck-ci.apps.ci.l2s4.p1.openshiftapps.com/view/gs/qe-private-deck/logs/periodic-ci-openshift-openshift-tests-private-release-4.20-amd64-nightly-baremetalds-ipi-ovn-f14-longrun-mco-p3/1938010861956239360