-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.18, 4.19, 4.20
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
In techpreview. When the nodes uses the rhel-coreos image stored locally and this image is garbage collected, the node is degraded if we try to apply an extension.
Version-Release number of selected component (if applicable):
$ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.20.0-0.nightly-2025-06-16-011145 True False 160m Cluster version is 4.20.0-0.nightly-2025-06-16-011145
How reproducible:
Always
Steps to Reproduce:
1. Enable OCL oc debug node/ip-10-0-21-6.us-east-2.compute.internal -- chroot /host rpm-ostree status Starting pod/ip-10-0-21-6us-east-2computeinternal-debug-v5mvt ... To use host binaries, run `chroot /host` State: idle Deployments: * ostree-unverified-registry:quay.io/mcoqe/layering@sha256:16a1327d039f5b3f7123d5bc4510588138e545ecede3a83a9bc8d8ff288a6ec4 Digest: sha256:16a1327d039f5b3f7123d5bc4510588138e545ecede3a83a9bc8d8ff288a6ec4 Version: 9.6.20250611-0 (2025-06-16T08:42:27Z) 2. Disable OCL 3. Check that one of the nodes will use a "containers-storage" image. (I'm not sure, but it seems that it will be the node running the os-builder pod) Note the "containers-storage" in the deployment, it means that we are taking the image from the local storage and we are not pulling it from the registry because the image is already present in the node. $ oc debug -q node/ip-10-0-21-39.us-east-2.compute.internal -- chroot /host rpm-ostree status State: idle Deployments: * ostree-unverified-image:containers-storage:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:592466f9241980ded5d6436448f71932e1a2c300d6fe7157408b8034ce63a846 Digest: sha256:592466f9241980ded5d6436448f71932e1a2c300d6fe7157408b8034ce63a846 Version: 9.6.20250611-0 (2025-06-13T04:45:42Z) Note that we can see the image pulled in the node sh-5.1# crictl images quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:592466f9241980ded5d6436448f71932e1a2c300d6fe7157408b8034ce63a846 IMAGE TAG IMAGE ID SIZE quay.io/openshift-release-dev/ocp-v4.0-art-dev <none> c6c98cd692bd5 2.53GB 4. Manually remove this image from the node. For the sake of simplicity we will make it manually, but in a real cluster it will happen when kubernetes garbage collects the images, since this image (rhel-coreos) is never used by a pod sh-5.1# crictl images quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:592466f9241980ded5d6436448f71932e1a2c300d6fe7157408b8034ce63a846 IMAGE TAG IMAGE ID SIZE quay.io/openshift-release-dev/ocp-v4.0-art-dev <none> c6c98cd692bd5 2.53GB sh-5.1# crictl rmi c6c98cd692bd5 E0616 10:45:14.401000 8214 log.go:32] "RemoveImage from image service failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" image="c6c98cd692bd5" error of removing image "c6c98cd692bd5": rpc error: code = DeadlineExceeded desc = context deadline exceeded sh-5.1# crictl images quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:592466f9241980ded5d6436448f71932e1a2c300d6fe7157408b8034ce63a846 IMAGE TAG IMAGE ID SIZE sh-5.1# rpm-ostree status State: idle Deployments: * ostree-unverified-image:containers-storage:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:592466f9241980ded5d6436448f71932e1a2c300d6fe7157408b8034ce63a846 Digest: sha256:592466f9241980ded5d6436448f71932e1a2c300d6fe7157408b8034ce63a846 Version: 9.6.20250611-0 (2025-06-13T04:45:42Z) 5. Create a MC to deploy the usbguard extension
Actual results:
The pool is degraded with this message - lastTransitionTime: "2025-06-16T10:50:58Z" message: 'Node ip-10-0-21-39.us-east-2.compute.internal is reporting: "error running rpm-ostree update --install usbguard: error: Creating importer: failed to invoke method OpenImage: failed to invoke method OpenImage: reference \"[overlay@/var/lib/containers/storage+/run/containers/storage:overlay.skip_mount_home=true]quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:592466f9241980ded5d6436448f71932e1a2c300d6fe7157408b8034ce63a846\" does not resolve to an image ID\n: exit status 1"' reason: "" status: "True" type: Degraded
Expected results:
The pool should not be degraded
Additional info:
Note that even if we manually remove the image in order to get a minimal reproducer, this image is always removed by the kubernetes garbage collector since it is never used to run a pod and is not pinned. Only in techpreview It seems to be related to MCO supporting local rh-coreos image to make disconnecte upgraded with pinned images possible. https://github.com/openshift/machine-config-operator/commit/9fcd4b0fb9932902eced19cdfaf2c5c88a4100c9