-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.17
-
Moderate
-
None
-
False
-
Description of problem:
When we enable OCL in a mcp and a machineosbuild fails, if we fix the problem that caused the failure the new images are not correctly deployed in the nodes.
Version-Release number of selected component (if applicable):
$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.17.1 True False 135m Cluster version is 4.17.1
How reproducible:
Always
Steps to Reproduce:
1. Create a MSOC resource for the worker pool To reproduce this issue we use the MCO QE quay repository. To use this repo we previously add the credentials to the pull-secret. We can use any other repo. oc create -f - << EOF apiVersion: machineconfiguration.openshift.io/v1alpha1 kind: MachineOSConfig metadata: name: worker spec: machineConfigPool: name: worker buildOutputs: currentImagePullSecret: name: $(oc get -n openshift-machine-config-operator sa default -ojsonpath='{.secrets[0].name}') buildInputs: imageBuilder: imageBuilderType: PodImageBuilder baseImagePullSecret: name: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy") renderedImagePushSecret: name: $(oc get -n openshift-machine-config-operator sa builder -ojsonpath='{.secrets[0].name}') renderedImagePushspec: "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image:latest" EOF 2. Wait until the build is built and deployed on the worker nodes 3. Once the image is correctly applied, edit the MOSC resource to add a custom container file using rhel enablement. Since no rhel enablement secret was configured, OCL should fail when it builds the image. This is the section that we add to configure the wrong container file: containerFile: - content: |- FROM configs AS final RUN rm -rf /etc/rhsm-host && \ rpm-ostree install buildah && \ ln -s /run/secrets/rhsm /etc/rhsm-host && \ ostree container commit 4. Create a new MC (any MC) to trigger a build and Wait until the image fails. 5. Edit the MOSC resource to use a custom container file that do not need rhel enablement. This is the section that we used to fix the custom container file: containerFile: - containerfileArch: noarch content: |- # Pull the centos base image and enable the EPEL repository. FROM quay.io/centos/centos:stream9 AS centos RUN dnf install -y epel-release # Build the final OS image for this MachineConfigPool. FROM configs AS final # Copy the EPEL configs into the final image. COPY --from=centos /etc/yum.repos.d /etc/yum.repos.d COPY --from=centos /etc/pki/rpm-gpg/RPM-GPG-KEY-* /etc/pki/rpm-gpg/ # Install cowsay and ripgrep from the EPEL repository into the final image, # along with a custom cow file. RUN sed -i 's/\$stream/9-stream/g' /etc/yum.repos.d/centos*.repo && \ rpm-ostree install cowsay ripgrep 5. Create another MC (any MC) to trigger a new build again. 6. The new build succeeds and the image is pushed 7. The MCP is never updated with the new image
Actual results:
Once we fix the failed build, OCL does not apply the new build to the MCP. The worker MCP is stuck in this status: $ oc get mcp NAME CONFIG UPDATED UPDATING DEGRADED MACHINECOUNT READYMACHINECOUNT UPDATEDMACHINECOUNT DEGRADEDMACHINECOUNT AGE master rendered-master-4c3a27c89989374052f5f68e8cc2ce3e True False False 3 3 3 0 171m worker rendered-worker-c5ca92b560e6ffec2918864d09c9385e False True False 2 0 0 0 171m
Expected results:
If a build fails, and we fix the problem and a new build is created, this new build should be applied in the nodes without problems.
Additional info: