[ACM-18936] open-cluster-management-agent-addon pods fail on bad images - Red Hat Issue Tracker

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: HyperShift
Labels:
None

Blocked:
False
Blocked Reason:
None
Ready:
False
Intelligence Requested:
Market:

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

When deploying hypershift with kubevirt provider, the hypershift cluster has 3 failed pods on  ImagePullBackOff status.

all 3 pods of    open-cluster-management-agent-addon pods ate failing :
[kni@ocp-edge77 ~]$ oc get pods -n open-cluster-management-agent-addon
NAME                                                  READY   STATUS             RESTARTS         AGE
cluster-proxy-proxy-agent-6c9db9f9c6-lpv8f            0/3     ImagePullBackOff   0                13h
klusterlet-addon-workmgr-57f549df44-2gktc             0/1     ImagePullBackOff   0                13h
managed-serviceaccount-addon-agent-6df66775df-w2vdk   0/1     ImagePullBackOff   0                13h

They fail for :

    state:
      waiting:
        message: Back-off pulling image "registry.redhat.io/multicluster-engine/cluster-proxy-addon-rhel9@sha256:d3e58077b20e1fe18920b837189acf8d164a66fb3ed4a2a498cfa4c0f78f333c"
        reason: ImagePullBackOff

trying to pull any of them fails for "manifest unknown"

[kni@ocp-edge77 ~]$ podman pull registry.redhat.io/multicluster-engine/managed-serviceaccount-rhel9@sha256:6dbf797be0bc2da59de123ce98bdd7f66bf0b3c606a1eda97feadca50ef26a21 --authfile ~kni/combined-secret.json Trying to pull registry.redhat.io/multicluster-engine/managed-serviceaccount-rhel9@sha256:6dbf797be0bc2da59de123ce98bdd7f66bf0b3c606a1eda97feadca50ef26a21... Error: initializing source docker://registry.redhat.io/multicluster-engine/managed-serviceaccount-rhel9@sha256:6dbf797be0bc2da59de123ce98bdd7f66bf0b3c606a1eda97feadca50ef26a21: reading manifest sha256:6dbf797be0bc2da59de123ce98bdd7f66bf0b3c606a1eda97feadca50ef26a21 in registry.redhat.io/multicluster-engine/managed-serviceaccount-rhel9: manifest unknown: manifest unknown [kni@ocp-edge77 ~]$

The full list of the bad, unreachable images :

"registry.redhat.io/multicluster-engine/multicloud-manager-rhel9@sha256:193dabca919341a8155107cda24761e5d3805a9086578500a3a63ed8d45244b5" "registry.redhat.io/multicluster-engine/cluster-proxy-rhel9@sha256:3cef69e4b4435981a5ad327e5327f452cf08ba1fd4806db9bb55c8cb4b54d3ba" "registry.redhat.io/multicluster-engine/managed-serviceaccount-rhel9@sha256:6dbf797be0bc2da59de123ce98bdd7f66bf0b3c606a1eda97feadca50ef26a21"

editing the deployment for the latest version from the catalog gui worked fine, as it also pulled manually:

kni@ocp-edge77 ~]$ podman pull registry.redhat.io/multicluster-engine/managed-serviceaccount-rhel9@sha256:7cc762157a6a852763f8f1b2f6849e4d28ca9a6ee40b3e9388dfcdfb45ae3fb6 --authfile ~kni/combined-secret.json 
Trying to pull registry.redhat.io/multicluster-engine/managed-serviceaccount-rhel9@sha256:7cc762157a6a852763f8f1b2f6849e4d28ca9a6ee40b3e9388dfcdfb45ae3fb6...
Getting image source signatures
Checking if image destination supports signatures
Copying blob 122322a06587 skipped: already exists  
Copying blob e1b29663e83e skipped: already exists  
Copying config 1305506ea7 done  
Writing manifest to image destination
Storing signatures
1305506ea79cda2833f1005c18b934f4dad6e45a3b3aab966e9567ec7d850b85

Version-Release number of selected component (if applicable):

MCE 2.6.7-DOWNANDBACK-2025-03-10-22-49-35

[kni@ocp-edge77 ~]$ oc version
Client Version: 4.17.0
Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3
Server Version: 4.16.36
Kubernetes Version: v1.29.11+148a389
[kni@ocp-edge77 ~]$

How reproducible:

    Happans all the time

Steps to Reproduce:

    1.deploy a hub cluster with 3 master nodes and no workers,
    2. on it, deploy a hosted cluster with 2 worker nodes (no masters ) with kubevirt provider

Actual results:

    Test fail for none ready pods : https://jenkins-csb-kniqe-auto.dno.corp.redhat.com/job/ocp-edge-auto-tests/14296/testReport/deployment.installer/test_basic_sanity/test_pods_status/

Expected results:

    App hypershivt pods should be in healthy state.

Additional info:

Verifying notes:

I used this job to deploy: https://jenkins-csb-kniqe-auto.dno.corp.redhat.com/job/CI/job/job-runner/4753/

CI brakes here:  https://jenkins-csb-kniqe-ci.dno.corp.redhat.com/job/ocp-edge-auto-tests/49522/testReport/deployment.installer/test_basic_sanity/test_pods_status/#:~:text=Failed%20pods%3A%20%7B%27open%2Dcluster,Pending%27%7D%0A%20%20%09The%20pods%20list%3A

Assignee:: Unassigned

Reporter:: Gal Amado

QA Contact:: David Huynh

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025/03/12 11:31 AM

Updated:: 2025/03/27 10:32 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

Hide