-
Bug
-
Resolution: Done
-
Critical
-
None
-
4.12.0
-
Critical
-
None
-
Approved
-
False
-
-
NA
-
Rejected
Description of problem:
When we try to upgrade a cluster with a custom osImage in it, the machine-config CO becomes degraded and the upgrade fails The error reported is $ oc get co machine-config NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE machine-config 4.12.0-0.nightly-2022-09-08-002336 True True True 173m Failed to resync 4.12.0-0.nightly-2022-09-08-002336 because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, pool master has not progressed to latest configuration: osImageURL mismatch for master in rendered-master-53b475e3fbcf4f5f99ebd2fe1a3a4151 expected: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:34239e0d6c0a2090ca449be8253ec07cb77e4ffe6d43372a9624a42845026a81 got: quay.io/sregidor/sregidor-os:mco_layering, retrying]
Version-Release number of selected component (if applicable):
$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.12.0-0.nightly-2022-09-08-002336 True True 115m Unable to apply 4.12.0-0.nightly-2022-09-12-152748: wait has exceeded 40 minutes for these operators: machine-config
How reproducible:
Always
Steps to Reproduce:
1. Get the base image that we need to use to create a new layered osImage $ oc adm release info --pullspecs 2> /dev/null| grep rhel rhel-coreos-8 quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:edd34e4e74a75099acffe5a15f34b8a7b89fe92810817fdd633b1582f44af4d3 2. Create a new osImage using this Dokerfile (the base image is the one found in step 1) FROM quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:edd34e4e74a75099acffe5a15f34b8a7b89fe92810817fdd633b1582f44af4d3 RUN printf '[baseos]\nname=CentOS-$releasever - Base\nbaseurl=http://mirror.centos.org/centos/$releasever-stream/BaseOS/$basearch/os/\ngpgcheck=0\nenabled=1\n\n[appstream]\nname=CentOS-$releasever - AppStream\nbaseurl=http://mirror.centos.org/centos/$releasever-stream/AppStream/$basearch/os/\ngpgcheck=0\nenabled=1\n\n' > /etc/yum.repos.d/centos.repo && \ rpm-ostree install zsh && \ rpm-ostree cleanup -m && \ ostree container commit 3. Create a machine config to apply the new osImage to worker pool cat <<EOF | oc create -f - kind: MachineConfig apiVersion: machineconfiguration.openshift.io/v1 metadata: labels: machineconfiguration.openshift.io/role: "worker" name: "tc-54183-new-os-image-upgrade-worker" spec: osImageURL: "quay.io/sregidor/sregidor-os:mco_layering" EOF 4. Create a machine config to apply the new osImage to master pool cat <<EOF | oc create -f - kind: MachineConfig apiVersion: machineconfiguration.openshift.io/v1 metadata: labels: machineconfiguration.openshift.io/role: "master" name: "tc-54183-new-os-image-upgrade-master" spec: osImageURL: "quay.io/sregidor/sregidor-os:mco_layering" EOF 5. Wait until all MCPs are updated $ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-53b475e3fbcf4f5f99ebd2fe1a3a4151 True False False 3 3 3 0 144m worker rendered-worker-c317487e0f68b1f90235dee2f6dd3538 True False False 2 2 2 0 144m 6. Upgrade the cluster (use the right release image in the following command) $ oc adm upgrade --to-image registry.ci.openshift.org/ocp/release:4.12.0-0.nightly-2022-09-12-152748 --allow-explicit-upgrade --force
Actual results:
The machine-config CO becomes degraded reporting this error: $ oc get co machine-config NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE machine-config 4.12.0-0.nightly-2022-09-08-002336 True True True 173m Failed to resync 4.12.0-0.nightly-2022-09-08-002336 because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, pool master has not progressed to latest configuration: osImageURL mismatch for master in rendered-master-53b475e3fbcf4f5f99ebd2fe1a3a4151 expected: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:34239e0d6c0a2090ca449be8253ec07cb77e4ffe6d43372a9624a42845026a81 got: quay.io/sregidor/sregidor-os:mco_layering, retrying]
Expected results:
The upgrade should finish without errors, and the upgraded cluster should use the custom osImage that was originally present before the upgrade.
Additional info: