Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.17
Component/s: Machine Config Operator
Labels:
- qe-ocb-test

Severity:
Moderate
Regression:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

When we enable OCL in a mcp and a machineosbuild fails, if we fix the problem that caused the failure the new images are not correctly deployed in the nodes.

Version-Release number of selected component (if applicable):

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.17.1    True        False         135m    Cluster version is 4.17.1

How reproducible:

Always

Steps to Reproduce:

1. Create a MSOC resource for the worker pool

To reproduce this issue we use the MCO QE quay repository. To use this repo we previously add the credentials to the pull-secret. We can use any other repo.

oc create -f - << EOF
apiVersion: machineconfiguration.openshift.io/v1alpha1
kind: MachineOSConfig
metadata:
name: worker
spec:
machineConfigPool:
name: worker
buildOutputs:
currentImagePullSecret:
name: $(oc get -n openshift-machine-config-operator sa default -ojsonpath='{.secrets[0].name}')
buildInputs:
imageBuilder:
imageBuilderType: PodImageBuilder
baseImagePullSecret:
name: $(oc get secret -n openshift-config pull-secret -o json | jq "del(.metadata.namespace, .metadata.creationTimestamp, .metadata.resourceVersion, .metadata.uid, .metadata.name)" | jq '.metadata.name="pull-copy"' | oc -n openshift-machine-config-operator create -f - &> /dev/null; echo -n "pull-copy")
renderedImagePushSecret:
name: $(oc get -n openshift-machine-config-operator sa builder -ojsonpath='{.secrets[0].name}')
renderedImagePushspec: "image-registry.openshift-image-registry.svc:5000/openshift-machine-config-operator/ocb-image:latest"
EOF

2. Wait until the build is built and deployed on the worker nodes

3. Once the image is correctly applied, edit the MOSC resource to add a custom container file using rhel enablement. Since no rhel enablement secret was configured, OCL should fail when it builds the image.

This is the section that we add to configure the wrong container file:

containerFile:
- content: |-
FROM configs AS final

RUN rm -rf /etc/rhsm-host && \
rpm-ostree install buildah && \
ln -s /run/secrets/rhsm /etc/rhsm-host && \
ostree container commit

4. Create a new MC (any MC) to trigger a build and Wait until the image fails.
5. Edit the MOSC resource to use a custom container file that do not need rhel enablement.

This is the section that we used to fix the custom container file:

containerFile:
- containerfileArch: noarch
content: |-
# Pull the centos base image and enable the EPEL repository.
FROM quay.io/centos/centos:stream9 AS centos
RUN dnf install -y epel-release

# Build the final OS image for this MachineConfigPool.
FROM configs AS final

# Copy the EPEL configs into the final image.
COPY --from=centos /etc/yum.repos.d /etc/yum.repos.d
COPY --from=centos /etc/pki/rpm-gpg/RPM-GPG-KEY-* /etc/pki/rpm-gpg/

# Install cowsay and ripgrep from the EPEL repository into the final image,
# along with a custom cow file.
RUN sed -i 's/\$stream/9-stream/g' /etc/yum.repos.d/centos*.repo && \
rpm-ostree install cowsay ripgrep

5. Create another MC (any MC) to trigger a new build again.
6. The new build succeeds and the image is pushed
7. The MCP is never updated with the new image

Actual results:


Once we fix the failed build, OCL does not apply the new build to the MCP.

The worker MCP is stuck in this status:
$ oc get mcp
NAME     CONFIG                                             UPDATED   UPDATING   DEGRADED   MACHINECOUNT   READYMACHINECOUNT   UPDATEDMACHINECOUNT   DEGRADEDMACHINECOUNT   AGE
master   rendered-master-4c3a27c89989374052f5f68e8cc2ce3e   True      False      False      3              3                   3                     0                      171m
worker   rendered-worker-c5ca92b560e6ffec2918864d09c9385e   False     True       False      2              0                   0                     0                      171m

Expected results:

If a build fails, and we fix the problem and a new build is created, this new build should be applied in the nodes without problems.

Additional info:

Assignee:: Team MCO

Reporter:: Sergio Regidor de la Rosa

QA Contact:: Sergio Regidor de la Rosa

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2024/10/14 4:03 PM

Updated:: 2024/11/18 8:59 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates