Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-1324

Clusters with a custom osImage cannot be upgraded

    XMLWordPrintable

Details

    • Critical
    • Approved
    • False
    • Hide

      None

      Show
      None
    • NA
    • Rejected

    Description

      Description of problem:

      When we try to upgrade a cluster with a custom osImage in it, the machine-config CO becomes degraded and the upgrade fails
      
      The error reported is
      
      $ oc get co machine-config
      NAME             VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      machine-config   4.12.0-0.nightly-2022-09-08-002336   True        True          True       173m    Failed to resync 4.12.0-0.nightly-2022-09-08-002336 because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, pool master has not progressed to latest configuration: osImageURL mismatch for master in rendered-master-53b475e3fbcf4f5f99ebd2fe1a3a4151 expected: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:34239e0d6c0a2090ca449be8253ec07cb77e4ffe6d43372a9624a42845026a81 got: quay.io/sregidor/sregidor-os:mco_layering, retrying]

      Version-Release number of selected component (if applicable):

      $ oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.12.0-0.nightly-2022-09-08-002336   True        True          115m    Unable to apply 4.12.0-0.nightly-2022-09-12-152748: wait has exceeded 40 minutes for these operators: machine-config
      

      How reproducible:

      Always

      Steps to Reproduce:

      1. Get the base image that we need to use to create a new layered osImage
      
      $  oc adm release info --pullspecs 2> /dev/null| grep rhel
        rhel-coreos-8                                  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:edd34e4e74a75099acffe5a15f34b8a7b89fe92810817fdd633b1582f44af4d3
      
      
      2. Create a new osImage using this Dokerfile (the base image is the one found in step 1)
      
      FROM quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:edd34e4e74a75099acffe5a15f34b8a7b89fe92810817fdd633b1582f44af4d3
      
      RUN printf '[baseos]\nname=CentOS-$releasever - Base\nbaseurl=http://mirror.centos.org/centos/$releasever-stream/BaseOS/$basearch/os/\ngpgcheck=0\nenabled=1\n\n[appstream]\nname=CentOS-$releasever - AppStream\nbaseurl=http://mirror.centos.org/centos/$releasever-stream/AppStream/$basearch/os/\ngpgcheck=0\nenabled=1\n\n' > /etc/yum.repos.d/centos.repo && \
          rpm-ostree install zsh && \
          rpm-ostree cleanup -m && \
          ostree container commit
      
      
      3. Create a machine config to apply the new osImage to worker pool
      
      cat <<EOF | oc create -f -
      kind: MachineConfig
      apiVersion: machineconfiguration.openshift.io/v1
      metadata:
        labels:
          machineconfiguration.openshift.io/role: "worker"
        name: "tc-54183-new-os-image-upgrade-worker"
      spec:
        osImageURL: "quay.io/sregidor/sregidor-os:mco_layering"
      EOF
      
      
      4. Create a machine config to apply the new osImage to master pool
      
      cat <<EOF | oc create -f -
      kind: MachineConfig
      apiVersion: machineconfiguration.openshift.io/v1
      metadata:
        labels:
          machineconfiguration.openshift.io/role: "master"
        name: "tc-54183-new-os-image-upgrade-master"
      spec:
        osImageURL: "quay.io/sregidor/sregidor-os:mco_layering"
      EOF
      
      
      5. Wait until all MCPs are updated
      
      $ oc get mcp
      NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
      master   rendered-master-53b475e3fbcf4f5f99ebd2fe1a3a4151   True      False      False      3              3                   3                     0                      144m
      worker   rendered-worker-c317487e0f68b1f90235dee2f6dd3538   True      False      False      2              2                   2                     0                      144m
      
      
      6. Upgrade the cluster (use the right release image in the following command)
      
      $ oc adm upgrade --to-image  registry.ci.openshift.org/ocp/release:4.12.0-0.nightly-2022-09-12-152748 --allow-explicit-upgrade --force
      
      
      

      Actual results:

      The machine-config CO becomes degraded reporting this error:
      
      
      $ oc get co machine-config
      NAME             VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      machine-config   4.12.0-0.nightly-2022-09-08-002336   True        True          True       173m    Failed to resync 4.12.0-0.nightly-2022-09-08-002336 because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, pool master has not progressed to latest configuration: osImageURL mismatch for master in rendered-master-53b475e3fbcf4f5f99ebd2fe1a3a4151 expected: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:34239e0d6c0a2090ca449be8253ec07cb77e4ffe6d43372a9624a42845026a81 got: quay.io/sregidor/sregidor-os:mco_layering, retrying]

      Expected results:

      The upgrade should finish without errors, and the upgraded cluster should use the custom osImage that was originally present before the upgrade.

      Additional info:

       

       

      Attachments

        Activity

          People

            jkyros@redhat.com John Kyros
            sregidor@redhat.com Sergio Regidor de la Rosa
            Sergio Regidor de la Rosa Sergio Regidor de la Rosa
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: