Loading...

Type: Bug
Resolution: Done-Errata
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.19.0
Component/s: Installer / openshift-installer
Labels:

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None

Target Backport Versions:

4.19
Target Version:

4.19.z
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
Done
Release Note Type:
Bug Fix
Release Note Text:

Hide
Previously, if a user specified a custom boot image for Amazon Web Services (AWS) or the Google Cloud Platform (GCP), the Machine Config Operator (MCO) would overwrite it with the default manage image during installation. With this release, a manifest generation was added for MCO configuration which disables the default managed image during installation if a custom image is specified. (link:https://issues.redhat.com/browse/OCPBUGS-57796[~~OCPBUGS-57796~~])

Show
Previously, if a user specified a custom boot image for Amazon Web Services (AWS) or the Google Cloud Platform (GCP), the Machine Config Operator (MCO) would overwrite it with the default manage image during installation. With this release, a manifest generation was added for MCO configuration which disables the default managed image during installation if a custom image is specified. (link: https://issues.redhat.com/browse/OCPBUGS-57796 [ OCPBUGS-57796 ])

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

This is a clone of issue OCPBUGS-57348. The following is the description of the original issue:
—

Description of problem

OpenShift 4.19 introduced default boot-image management for AWS & GCP. The problem is that if users set a custom boot image, such as a marketplace image or otherwise custom image, the MCO will overwrite the user-specified image with the managed one (at some point during the install phase, see the Additional info section below for more about timing).

Losing the image would be an issue for marketplace images or custom images may lose required assets, such as CAs, which could result in failures.

Version-Release number of selected component (if applicable):

4.19

How reproducible

The MCO-updates in MachineSets are always reproducible.

Whether initial compute Machines are impacted depends on a race between the MCO updating the MachineSet's boot images and the Machine API using the MachineSet to create any initial compute Machines, as described in the Additional info section below.

Steps to reproduce

Set custom boot image in either the default or compute machine pool (control-plane boot image customization is safe until ControlPlaneMachineSet boot images management is delivered via MCO-1007).
1. aws: platform.aws.amiID
2. gcp: platform.gcp.osImage
Perform installation
Check machineset for updated bootimage

On GCP clusters, checking MachineSet boot images looks like:

$ oc -n openshift-machine-api get -o jsonpath='{range .items[*]}{range .spec.template.spec.providerSpec.value.disks[*]}{.image}{"\n"}{end}{end}' machinesets.machine.openshift.io | sort | uniq -c

On AWS clusters, checking MachineSet boot images looks like:

$ oc -n openshift-machine-api get -o jsonpath='{range .items[*]}{.spec.template.spec.providerSpec.value.ami}{"\n"}{end}' machinesets.machine.openshift.io | sort | uniq -c

Actual results

custom boot image is overwritten in machinesets

Expected results

custom boot image is maintained in machineset

Additional info

rhn-support-sdodson pointed out that a minimum boot image will be enforced in the cluster (RFE-6216), so if we decide that users will be on the hook for managing boot images when specifying custom images, we will need to document the need to update. But this is an issue that will need to get sorted, regardless of installer behavior.

As an example of the MCO-boot-image-update vs. Machine API Machine-creation race, https://amd64.ocp.releases.ci.openshift.org/ > 4.19.0-0.nightly-2025-06-06-163527 > rosa-classic-sts-conformance > Artifacts > ... > gather-extra artifacts and must-gather artifacts:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-rosa-sts-ovn/1931028069485645824/artifacts/e2e-rosa-sts-ovn/gather-extra/artifacts/configmaps.json | jq -r '.items[] | select(.metadata | .namespace == "kube-system" and .name == "cluster-config-v1").data["install-config"]' | yaml2json | jq -r '.compute[].platform.aws.amiID'
ami-0e97cca5690da89da
$ curl -s curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-rosa-sts-ovn/1931028069485645824/artifacts/e2e-rosa-sts-ovn/gather-extra/artifacts/machinesets.json | jq -r '.items[] | select(.metadata.generation > 1) | .metadata.name + " " + (.metadata.generation | tostring) + " " + .spec.template.spec.providerSpec.value.ami.id'
ci-rosa-s-vpvk-pwmbq-worker-us-west-2a 2 ami-0b29d41f2ed6b8c94
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-rosa-sts-ovn/1931028069485645824/artifacts/e2e-rosa-sts-ovn/gather-must-gather/artifacts/must-gather.tar | tar -xOz quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-a1ad408471607f7b402d6e1be8b4606a4605f408ae9ecb763e7d8adc776e46d1/namespaces/openshift-machine-api/machine.openshift.io/machinesets/ci-rosa-s-vpvk-pwmbq-worker-us-west-2a.yaml | yaml2json | jq -r '[.metadata.managedFields[] | select(.subresource == null) | .time + " " + .operation + " " + .manager + " " + (.fieldsV1 | tostring[:100])] | sort[]'
2025-06-06T17:36:28Z Update cluster-bootstrap \{"f:metadata":{"f:labels":{".":{},"f:hive.openshift.io/machine-pool":{},"f:hive.openshift.io/managed
2025-06-06T17:41:46Z Update machine-config-controller \{"f:spec":{"f:template":{"f:spec":{"f:providerSpec":{"f:value":{"f:ami":{"f:id":{}}}}}}}}
2025-06-06T17:41:59Z Update machine-controller-manager \{"f:metadata":{"f:annotations":{".":{},"f:capacity.cluster-autoscaler.kubernetes.io/labels":{},"f:ma
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-rosa-sts-ovn/1931028069485645824/artifacts/e2e-rosa-sts-ovn/gather-extra/artifacts/machines.json | jq -r '.items[] | (.metadata | .creationTimestamp + " " + .name) + " " + .spec.providerSpec.value.ami.id' | sort | grep worker
2025-06-06T17:41:50Z ci-rosa-s-vpvk-pwmbq-worker-us-west-2a-95rxf ami-0b29d41f2ed6b8c94
2025-06-06T17:41:50Z ci-rosa-s-vpvk-pwmbq-worker-us-west-2a-gb78t ami-0b29d41f2ed6b8c94

So:

The install-config requested ami-0e97cca5690da89da.
17:36:28, cluster-bootstrap pushes the installer-created MachineConfig into the cluster.
17:41:46, the MCO updates the MachineSet's boot image to the stock OCP AMI for that region: ami-0b29d41f2ed6b8c94.
17:41:50, the MachineAPI creates the first Machine using the stock ami-0b29d41f2ed6b8c94.

It seems like this race could easily go the other way, and the initial compute could have come up under the install-config-preferred boot image. But the check for "have we fixed the bug?" shouldn't hinge on the boot image used for the compute Machines, it should look at the MachineSet to see "did the MCO clobber the MachineSet boot image at all?", as described in the Expected results section.

clones

OCPBUGS-57348 Cluster manages bootimages despite explicit bootimages in installconfig

Verified

is blocked by

OCPBUGS-57348 Cluster manages bootimages despite explicit bootimages in installconfig

Verified

links to

openshift/installer#9797: [release-4.19] OCPBUGS-57796: add MCO operator manifest for boot image management

RHBA-2025:9750 OpenShift Container Platform 4.19.2 bug fix update

Details

Description

Description of problem

Version-Release number of selected component (if applicable):

How reproducible

Steps to reproduce

Actual results

Expected results

Additional info

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates