-
Bug
-
Resolution: Done-Errata
-
Critical
-
None
-
4.19.0
This is a clone of issue OCPBUGS-57348. The following is the description of the original issue:
—
Description of problem
OpenShift 4.19 introduced default boot-image management for AWS & GCP. The problem is that if users set a custom boot image, such as a marketplace image or otherwise custom image, the MCO will overwrite the user-specified image with the managed one (at some point during the install phase, see the Additional info section below for more about timing).
Losing the image would be an issue for marketplace images or custom images may lose required assets, such as CAs, which could result in failures.
Version-Release number of selected component (if applicable):
4.19
How reproducible
The MCO-updates in MachineSets are always reproducible.
Whether initial compute Machines are impacted depends on a race between the MCO updating the MachineSet's boot images and the Machine API using the MachineSet to create any initial compute Machines, as described in the Additional info section below.
Steps to reproduce
- Set custom boot image in either the default or compute machine pool (control-plane boot image customization is safe until ControlPlaneMachineSet boot images management is delivered via MCO-1007).
- aws: platform.aws.amiID
- gcp: platform.gcp.osImage
- Perform installation
- Check machineset for updated bootimage
On GCP clusters, checking MachineSet boot images looks like:
$ oc -n openshift-machine-api get -o jsonpath='{range .items[*]}{range .spec.template.spec.providerSpec.value.disks[*]}{.image}{"\n"}{end}{end}' machinesets.machine.openshift.io | sort | uniq -c
On AWS clusters, checking MachineSet boot images looks like:
$ oc -n openshift-machine-api get -o jsonpath='{range .items[*]}{.spec.template.spec.providerSpec.value.ami}{"\n"}{end}' machinesets.machine.openshift.io | sort | uniq -c
Actual results
custom boot image is overwritten in machinesets
Expected results
custom boot image is maintained in machineset
Additional info
rhn-support-sdodson pointed out that a minimum boot image will be enforced in the cluster (RFE-6216), so if we decide that users will be on the hook for managing boot images when specifying custom images, we will need to document the need to update. But this is an issue that will need to get sorted, regardless of installer behavior.
As an example of the MCO-boot-image-update vs. Machine API Machine-creation race, https://amd64.ocp.releases.ci.openshift.org/ > 4.19.0-0.nightly-2025-06-06-163527 > rosa-classic-sts-conformance > Artifacts > ... > gather-extra artifacts and must-gather artifacts:
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-rosa-sts-ovn/1931028069485645824/artifacts/e2e-rosa-sts-ovn/gather-extra/artifacts/configmaps.json | jq -r '.items[] | select(.metadata | .namespace == "kube-system" and .name == "cluster-config-v1").data["install-config"]' | yaml2json | jq -r '.compute[].platform.aws.amiID' ami-0e97cca5690da89da $ curl -s curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-rosa-sts-ovn/1931028069485645824/artifacts/e2e-rosa-sts-ovn/gather-extra/artifacts/machinesets.json | jq -r '.items[] | select(.metadata.generation > 1) | .metadata.name + " " + (.metadata.generation | tostring) + " " + .spec.template.spec.providerSpec.value.ami.id' ci-rosa-s-vpvk-pwmbq-worker-us-west-2a 2 ami-0b29d41f2ed6b8c94 $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-rosa-sts-ovn/1931028069485645824/artifacts/e2e-rosa-sts-ovn/gather-must-gather/artifacts/must-gather.tar | tar -xOz quay-io-openshift-release-dev-ocp-v4-0-art-dev-sha256-a1ad408471607f7b402d6e1be8b4606a4605f408ae9ecb763e7d8adc776e46d1/namespaces/openshift-machine-api/machine.openshift.io/machinesets/ci-rosa-s-vpvk-pwmbq-worker-us-west-2a.yaml | yaml2json | jq -r '[.metadata.managedFields[] | select(.subresource == null) | .time + " " + .operation + " " + .manager + " " + (.fieldsV1 | tostring[:100])] | sort[]' 2025-06-06T17:36:28Z Update cluster-bootstrap \{"f:metadata":{"f:labels":{".":{},"f:hive.openshift.io/machine-pool":{},"f:hive.openshift.io/managed 2025-06-06T17:41:46Z Update machine-config-controller \{"f:spec":{"f:template":{"f:spec":{"f:providerSpec":{"f:value":{"f:ami":{"f:id":{}}}}}}}} 2025-06-06T17:41:59Z Update machine-controller-manager \{"f:metadata":{"f:annotations":{".":{},"f:capacity.cluster-autoscaler.kubernetes.io/labels":{},"f:ma $ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.19-e2e-rosa-sts-ovn/1931028069485645824/artifacts/e2e-rosa-sts-ovn/gather-extra/artifacts/machines.json | jq -r '.items[] | (.metadata | .creationTimestamp + " " + .name) + " " + .spec.providerSpec.value.ami.id' | sort | grep worker 2025-06-06T17:41:50Z ci-rosa-s-vpvk-pwmbq-worker-us-west-2a-95rxf ami-0b29d41f2ed6b8c94 2025-06-06T17:41:50Z ci-rosa-s-vpvk-pwmbq-worker-us-west-2a-gb78t ami-0b29d41f2ed6b8c94
So:
- The install-config requested ami-0e97cca5690da89da.
- 17:36:28, cluster-bootstrap pushes the installer-created MachineConfig into the cluster.
- 17:41:46, the MCO updates the MachineSet's boot image to the stock OCP AMI for that region: ami-0b29d41f2ed6b8c94.
- 17:41:50, the MachineAPI creates the first Machine using the stock ami-0b29d41f2ed6b8c94.
It seems like this race could easily go the other way, and the initial compute could have come up under the install-config-preferred boot image. But the check for "have we fixed the bug?" shouldn't hinge on the boot image used for the compute Machines, it should look at the MachineSet to see "did the MCO clobber the MachineSet boot image at all?", as described in the Expected results section.
- clones
-
OCPBUGS-57348 Cluster manages bootimages despite explicit bootimages in installconfig
-
- Verified
-
- is blocked by
-
OCPBUGS-57348 Cluster manages bootimages despite explicit bootimages in installconfig
-
- Verified
-
- links to
-
RHBA-2025:9750 OpenShift Container Platform 4.19.2 bug fix update