Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.18, 4.19, 4.20
Component/s: Machine Config Operator
Labels:
- mco-triaged
- qe-ocb-test

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Moderate
Regression:
None

Target Backport Versions:
None
Target Version:

4.21.0
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:


In techpreview.

When the nodes uses the rhel-coreos image stored locally and this image is garbage collected, the node is degraded if we try to apply an extension.

Version-Release number of selected component (if applicable):

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.20.0-0.nightly-2025-06-16-011145   True        False         160m    Cluster version is 4.20.0-0.nightly-2025-06-16-011145

How reproducible:

Always

Steps to Reproduce:


1. Enable OCL

oc debug node/ip-10-0-21-6.us-east-2.compute.internal -- chroot /host rpm-ostree status
Starting pod/ip-10-0-21-6us-east-2computeinternal-debug-v5mvt ...
To use host binaries, run `chroot /host`
State: idle
Deployments:
* ostree-unverified-registry:quay.io/mcoqe/layering@sha256:16a1327d039f5b3f7123d5bc4510588138e545ecede3a83a9bc8d8ff288a6ec4
                   Digest: sha256:16a1327d039f5b3f7123d5bc4510588138e545ecede3a83a9bc8d8ff288a6ec4
                  Version: 9.6.20250611-0 (2025-06-16T08:42:27Z)


2. Disable OCL

3. Check that one of the nodes will use a "containers-storage" image. (I'm not sure, but it seems that it will be the node running the os-builder pod)


Note the "containers-storage" in the deployment, it means that we are taking the image from the local storage and we are not pulling it from the registry because the image is already present in the node.

$ oc debug -q node/ip-10-0-21-39.us-east-2.compute.internal -- chroot /host rpm-ostree status
State: idle
Deployments:
* ostree-unverified-image:containers-storage:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:592466f9241980ded5d6436448f71932e1a2c300d6fe7157408b8034ce63a846
                   Digest: sha256:592466f9241980ded5d6436448f71932e1a2c300d6fe7157408b8034ce63a846
                  Version: 9.6.20250611-0 (2025-06-13T04:45:42Z)

Note that we can see the image pulled in the node 
sh-5.1# crictl images quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:592466f9241980ded5d6436448f71932e1a2c300d6fe7157408b8034ce63a846
IMAGE                                            TAG                 IMAGE ID            SIZE
quay.io/openshift-release-dev/ocp-v4.0-art-dev   <none>              c6c98cd692bd5       2.53GB


4. Manually remove this image from the node.

For the sake of simplicity we will make it manually, but in a real cluster it will happen when kubernetes garbage collects the images, since this image (rhel-coreos) is never used by a pod


sh-5.1# crictl images quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:592466f9241980ded5d6436448f71932e1a2c300d6fe7157408b8034ce63a846
IMAGE                                            TAG                 IMAGE ID            SIZE
quay.io/openshift-release-dev/ocp-v4.0-art-dev   <none>              c6c98cd692bd5       2.53GB
sh-5.1# crictl rmi c6c98cd692bd5
E0616 10:45:14.401000    8214 log.go:32] "RemoveImage from image service failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" image="c6c98cd692bd5"
error of removing image "c6c98cd692bd5": rpc error: code = DeadlineExceeded desc = context deadline exceeded
sh-5.1# crictl images quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:592466f9241980ded5d6436448f71932e1a2c300d6fe7157408b8034ce63a846
IMAGE               TAG                 IMAGE ID            SIZE
sh-5.1# rpm-ostree status
State: idle
Deployments:
* ostree-unverified-image:containers-storage:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:592466f9241980ded5d6436448f71932e1a2c300d6fe7157408b8034ce63a846
                   Digest: sha256:592466f9241980ded5d6436448f71932e1a2c300d6fe7157408b8034ce63a846
                  Version: 9.6.20250611-0 (2025-06-13T04:45:42Z)


5. Create a MC to deploy the usbguard extension

Actual results:


The pool is degraded with this message

  - lastTransitionTime: "2025-06-16T10:50:58Z"
    message: 'Node ip-10-0-21-39.us-east-2.compute.internal is reporting: "error running
      rpm-ostree update --install usbguard: error: Creating importer: failed to invoke
      method OpenImage: failed to invoke method OpenImage: reference \"[overlay@/var/lib/containers/storage+/run/containers/storage:overlay.skip_mount_home=true]quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:592466f9241980ded5d6436448f71932e1a2c300d6fe7157408b8034ce63a846\"
      does not resolve to an image ID\n: exit status 1"'
    reason: ""
    status: "True"
    type: Degraded

Expected results:


The pool should not be degraded

Additional info:


Note that even if we manually remove the image in order to get a minimal reproducer, this image is always removed by the kubernetes garbage collector since it is never used to run a pod and is not pinned.

Only in techpreview

It seems to be related to MCO supporting local rh-coreos image to make disconnecte upgraded with pinned images possible.
https://github.com/openshift/machine-config-operator/commit/9fcd4b0fb9932902eced19cdfaf2c5c88a4100c9

links to

openshift/machine-config-operator#5313: OCPBUGS-57473: extract oc binary instead of pulling OS image

Assignee:: Zack Zlotnik

Reporter:: Sergio Regidor de la Rosa

Need Info From:: None

Contributors:: None

QA Contact:: Sergio Regidor de la Rosa

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2025/06/16 11:02 AM

Updated:: 2025/09/30 5:50 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates