Uploaded image for project: 'Red Hat Advanced Cluster Management'
  1. Red Hat Advanced Cluster Management
  2. ACM-18936

open-cluster-management-agent-addon pods fail on bad images

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • HyperShift
    • None
    • False
    • None
    • False
    • None

      Description of problem:

      When deploying hypershift with kubevirt provider, the hypershift cluster has 3 failed pods on  ImagePullBackOff status.
      all 3 pods of    open-cluster-management-agent-addon pods ate failing :
      [kni@ocp-edge77 ~]$ oc get pods -n open-cluster-management-agent-addon
      NAME                                                  READY   STATUS             RESTARTS         AGE
      cluster-proxy-proxy-agent-6c9db9f9c6-lpv8f            0/3     ImagePullBackOff   0                13h
      klusterlet-addon-workmgr-57f549df44-2gktc             0/1     ImagePullBackOff   0                13h
      managed-serviceaccount-addon-agent-6df66775df-w2vdk   0/1     ImagePullBackOff   0                13h
       

      They fail for :

          state:
            waiting:
              message: Back-off pulling image "registry.redhat.io/multicluster-engine/cluster-proxy-addon-rhel9@sha256:d3e58077b20e1fe18920b837189acf8d164a66fb3ed4a2a498cfa4c0f78f333c"
              reason: ImagePullBackOff

       
      trying to pull any of them fails for "manifest unknown"

      [kni@ocp-edge77 ~]$ podman pull registry.redhat.io/multicluster-engine/managed-serviceaccount-rhel9@sha256:6dbf797be0bc2da59de123ce98bdd7f66bf0b3c606a1eda97feadca50ef26a21 --authfile ~kni/combined-secret.json Trying to pull registry.redhat.io/multicluster-engine/managed-serviceaccount-rhel9@sha256:6dbf797be0bc2da59de123ce98bdd7f66bf0b3c606a1eda97feadca50ef26a21... Error: initializing source docker://registry.redhat.io/multicluster-engine/managed-serviceaccount-rhel9@sha256:6dbf797be0bc2da59de123ce98bdd7f66bf0b3c606a1eda97feadca50ef26a21: reading manifest sha256:6dbf797be0bc2da59de123ce98bdd7f66bf0b3c606a1eda97feadca50ef26a21 in registry.redhat.io/multicluster-engine/managed-serviceaccount-rhel9: manifest unknown: manifest unknown [kni@ocp-edge77 ~]$ 

      The full list of the bad, unreachable images :

      "registry.redhat.io/multicluster-engine/multicloud-manager-rhel9@sha256:193dabca919341a8155107cda24761e5d3805a9086578500a3a63ed8d45244b5" "registry.redhat.io/multicluster-engine/cluster-proxy-rhel9@sha256:3cef69e4b4435981a5ad327e5327f452cf08ba1fd4806db9bb55c8cb4b54d3ba" "registry.redhat.io/multicluster-engine/managed-serviceaccount-rhel9@sha256:6dbf797be0bc2da59de123ce98bdd7f66bf0b3c606a1eda97feadca50ef26a21"

      editing the deployment for the latest version from the catalog gui worked fine, as it also pulled manually:

      kni@ocp-edge77 ~]$ podman pull registry.redhat.io/multicluster-engine/managed-serviceaccount-rhel9@sha256:7cc762157a6a852763f8f1b2f6849e4d28ca9a6ee40b3e9388dfcdfb45ae3fb6 --authfile ~kni/combined-secret.json 
      Trying to pull registry.redhat.io/multicluster-engine/managed-serviceaccount-rhel9@sha256:7cc762157a6a852763f8f1b2f6849e4d28ca9a6ee40b3e9388dfcdfb45ae3fb6...
      Getting image source signatures
      Checking if image destination supports signatures
      Copying blob 122322a06587 skipped: already exists  
      Copying blob e1b29663e83e skipped: already exists  
      Copying config 1305506ea7 done  
      Writing manifest to image destination
      Storing signatures
      1305506ea79cda2833f1005c18b934f4dad6e45a3b3aab966e9567ec7d850b85

      Version-Release number of selected component (if applicable):

      MCE 2.6.7-DOWNANDBACK-2025-03-10-22-49-35
      [kni@ocp-edge77 ~]$ oc version
      Client Version: 4.17.0
      Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
      Server Version: 4.16.36
      Kubernetes Version: v1.29.11+148a389
      [kni@ocp-edge77 ~]$ 
      

      How reproducible:

          Happans all the time

      Steps to Reproduce:

          1.deploy a hub cluster with 3 master nodes and no workers,
          2. on it, deploy a hosted cluster with 2 worker nodes (no masters ) with kubevirt provider
          

      Actual results:

          Test fail for none ready pods : https://jenkins-csb-kniqe-auto.dno.corp.redhat.com/job/ocp-edge-auto-tests/14296/testReport/deployment.installer/test_basic_sanity/test_pods_status/

      Expected results:

          App hypershivt pods should be in healthy state. 

      Additional info:

      Verifying notes:
      
      I used this job to deploy: https://jenkins-csb-kniqe-auto.dno.corp.redhat.com/job/CI/job/job-runner/4753/
      
      CI brakes here:  https://jenkins-csb-kniqe-ci.dno.corp.redhat.com/job/ocp-edge-auto-tests/49522/testReport/deployment.installer/test_basic_sanity/test_pods_status/#:~:text=Failed%20pods%3A%20%7B%27open%2Dcluster,Pending%27%7D%0A%20%20%09The%20pods%20list%3A

       

       

              Unassigned Unassigned
              rhn-support-gamado Gal Amado
              David Huynh David Huynh
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: