Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Critical
Fix Version/s: None
Affects Version/s: 4.12.0
Component/s: Machine Config Operator
Labels:
- layering
- mco_qe_os_layering

Severity:
Critical
Regression:
None
Release Blocker:
Approved
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Release Note Text:
NA
Release Note Status:
Rejected
Target Version:

4.12.0

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

When we try to upgrade a cluster with a custom osImage in it, the machine-config CO becomes degraded and the upgrade fails

The error reported is

$ oc get co machine-config
NAME             VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
machine-config   4.12.0-0.nightly-2022-09-08-002336   True        True          True       173m    Failed to resync 4.12.0-0.nightly-2022-09-08-002336 because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, pool master has not progressed to latest configuration: osImageURL mismatch for master in rendered-master-53b475e3fbcf4f5f99ebd2fe1a3a4151 expected: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:34239e0d6c0a2090ca449be8253ec07cb77e4ffe6d43372a9624a42845026a81 got: quay.io/sregidor/sregidor-os:mco_layering, retrying]

Version-Release number of selected component (if applicable):

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.12.0-0.nightly-2022-09-08-002336   True        True          115m    Unable to apply 4.12.0-0.nightly-2022-09-12-152748: wait has exceeded 40 minutes for these operators: machine-config

How reproducible:

Always

Steps to Reproduce:

1. Get the base image that we need to use to create a new layered osImage

$  oc adm release info --pullspecs 2> /dev/null| grep rhel
  rhel-coreos-8                                  quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:edd34e4e74a75099acffe5a15f34b8a7b89fe92810817fdd633b1582f44af4d3


2. Create a new osImage using this Dokerfile (the base image is the one found in step 1)

FROM quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:edd34e4e74a75099acffe5a15f34b8a7b89fe92810817fdd633b1582f44af4d3

RUN printf '[baseos]\nname=CentOS-$releasever - Base\nbaseurl=http://mirror.centos.org/centos/$releasever-stream/BaseOS/$basearch/os/\ngpgcheck=0\nenabled=1\n\n[appstream]\nname=CentOS-$releasever - AppStream\nbaseurl=http://mirror.centos.org/centos/$releasever-stream/AppStream/$basearch/os/\ngpgcheck=0\nenabled=1\n\n' > /etc/yum.repos.d/centos.repo && \
    rpm-ostree install zsh && \
    rpm-ostree cleanup -m && \
    ostree container commit


3. Create a machine config to apply the new osImage to worker pool

cat <<EOF | oc create -f -
kind: MachineConfig
apiVersion: machineconfiguration.openshift.io/v1
metadata:
  labels:
    machineconfiguration.openshift.io/role: "worker"
  name: "tc-54183-new-os-image-upgrade-worker"
spec:
  osImageURL: "quay.io/sregidor/sregidor-os:mco_layering"
EOF


4. Create a machine config to apply the new osImage to master pool

cat <<EOF | oc create -f -
kind: MachineConfig
apiVersion: machineconfiguration.openshift.io/v1
metadata:
  labels:
    machineconfiguration.openshift.io/role: "master"
  name: "tc-54183-new-os-image-upgrade-master"
spec:
  osImageURL: "quay.io/sregidor/sregidor-os:mco_layering"
EOF


5. Wait until all MCPs are updated

$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-53b475e3fbcf4f5f99ebd2fe1a3a4151   True      False      False      3              3                   3                     0                      144m
worker   rendered-worker-c317487e0f68b1f90235dee2f6dd3538   True      False      False      2              2                   2                     0                      144m


6. Upgrade the cluster (use the right release image in the following command)

$ oc adm upgrade --to-image  registry.ci.openshift.org/ocp/release:4.12.0-0.nightly-2022-09-12-152748 --allow-explicit-upgrade --force

Actual results:

The machine-config CO becomes degraded reporting this error:


$ oc get co machine-config
NAME             VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
machine-config   4.12.0-0.nightly-2022-09-08-002336   True        True          True       173m    Failed to resync 4.12.0-0.nightly-2022-09-08-002336 because: error during syncRequiredMachineConfigPools: [timed out waiting for the condition, pool master has not progressed to latest configuration: osImageURL mismatch for master in rendered-master-53b475e3fbcf4f5f99ebd2fe1a3a4151 expected: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:34239e0d6c0a2090ca449be8253ec07cb77e4ffe6d43372a9624a42845026a81 got: quay.io/sregidor/sregidor-os:mco_layering, retrying]

Expected results:

The upgrade should finish without errors, and the upgraded cluster should use the custom osImage that was originally present before the upgrade.

Additional info:

Assignee:: John Kyros

Reporter:: Sergio Regidor de la Rosa

QA Contact:: Sergio Regidor de la Rosa

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2022/09/14 1:44 PM

Updated:: 2023/01/17 7:40 PM

Resolved:: 2023/01/17 7:40 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates