Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-57473

Pool degraded when a node uses a local osImage and this image is garbage collected

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      
      In techpreview.
      
      When the nodes uses the rhel-coreos image stored locally and this image is garbage collected, the node is degraded if we try to apply an extension.
      
      

      Version-Release number of selected component (if applicable):

      $ oc get clusterversion
      NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
      version   4.20.0-0.nightly-2025-06-16-011145   True        False         160m    Cluster version is 4.20.0-0.nightly-2025-06-16-011145
      
      

      How reproducible:

      Always
      

      Steps to Reproduce:

      
      1. Enable OCL
      
      oc debug node/ip-10-0-21-6.us-east-2.compute.internal -- chroot /host rpm-ostree status
      Starting pod/ip-10-0-21-6us-east-2computeinternal-debug-v5mvt ...
      To use host binaries, run `chroot /host`
      State: idle
      Deployments:
      * ostree-unverified-registry:quay.io/mcoqe/layering@sha256:16a1327d039f5b3f7123d5bc4510588138e545ecede3a83a9bc8d8ff288a6ec4
                         Digest: sha256:16a1327d039f5b3f7123d5bc4510588138e545ecede3a83a9bc8d8ff288a6ec4
                        Version: 9.6.20250611-0 (2025-06-16T08:42:27Z)
      
      
      2. Disable OCL
      
      3. Check that one of the nodes will use a "containers-storage" image. (I'm not sure, but it seems that it will be the node running the os-builder pod)
      
      
      Note the "containers-storage" in the deployment, it means that we are taking the image from the local storage and we are not pulling it from the registry because the image is already present in the node.
      
      $ oc debug -q node/ip-10-0-21-39.us-east-2.compute.internal -- chroot /host rpm-ostree status
      State: idle
      Deployments:
      * ostree-unverified-image:containers-storage:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:592466f9241980ded5d6436448f71932e1a2c300d6fe7157408b8034ce63a846
                         Digest: sha256:592466f9241980ded5d6436448f71932e1a2c300d6fe7157408b8034ce63a846
                        Version: 9.6.20250611-0 (2025-06-13T04:45:42Z)
      
      Note that we can see the image pulled in the node 
      sh-5.1# crictl images quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:592466f9241980ded5d6436448f71932e1a2c300d6fe7157408b8034ce63a846
      IMAGE                                            TAG                 IMAGE ID            SIZE
      quay.io/openshift-release-dev/ocp-v4.0-art-dev   <none>              c6c98cd692bd5       2.53GB
      
      
      4. Manually remove this image from the node.
      
      For the sake of simplicity we will make it manually, but in a real cluster it will happen when kubernetes garbage collects the images, since this image (rhel-coreos) is never used by a pod
      
      
      sh-5.1# crictl images quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:592466f9241980ded5d6436448f71932e1a2c300d6fe7157408b8034ce63a846
      IMAGE                                            TAG                 IMAGE ID            SIZE
      quay.io/openshift-release-dev/ocp-v4.0-art-dev   <none>              c6c98cd692bd5       2.53GB
      sh-5.1# crictl rmi c6c98cd692bd5
      E0616 10:45:14.401000    8214 log.go:32] "RemoveImage from image service failed" err="rpc error: code = DeadlineExceeded desc = context deadline exceeded" image="c6c98cd692bd5"
      error of removing image "c6c98cd692bd5": rpc error: code = DeadlineExceeded desc = context deadline exceeded
      sh-5.1# crictl images quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:592466f9241980ded5d6436448f71932e1a2c300d6fe7157408b8034ce63a846
      IMAGE               TAG                 IMAGE ID            SIZE
      sh-5.1# rpm-ostree status
      State: idle
      Deployments:
      * ostree-unverified-image:containers-storage:quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:592466f9241980ded5d6436448f71932e1a2c300d6fe7157408b8034ce63a846
                         Digest: sha256:592466f9241980ded5d6436448f71932e1a2c300d6fe7157408b8034ce63a846
                        Version: 9.6.20250611-0 (2025-06-13T04:45:42Z)
      
      
      5. Create a MC to deploy the usbguard extension
      
      

      Actual results:

      
      The pool is degraded with this message
      
        - lastTransitionTime: "2025-06-16T10:50:58Z"
          message: 'Node ip-10-0-21-39.us-east-2.compute.internal is reporting: "error running
            rpm-ostree update --install usbguard: error: Creating importer: failed to invoke
            method OpenImage: failed to invoke method OpenImage: reference \"[overlay@/var/lib/containers/storage+/run/containers/storage:overlay.skip_mount_home=true]quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:592466f9241980ded5d6436448f71932e1a2c300d6fe7157408b8034ce63a846\"
            does not resolve to an image ID\n: exit status 1"'
          reason: ""
          status: "True"
          type: Degraded
      
      
      
      

      Expected results:

      
      The pool should not be degraded
      
      

      Additional info:

      
      Note that even if we manually remove the image in order to get a minimal reproducer, this image is always removed by the kubernetes garbage collector since it is never used to run a pod and is not pinned.
      
      Only in techpreview
      
      It seems to be related to MCO supporting local rh-coreos image to make disconnecte upgraded with pinned images possible.
      https://github.com/openshift/machine-config-operator/commit/9fcd4b0fb9932902eced19cdfaf2c5c88a4100c9
      
      
      

              zzlotnik@redhat.com Zack Zlotnik
              sregidor@redhat.com Sergio Regidor de la Rosa
              None
              None
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: