Loading...

Type: Bug
Resolution: Cannot Reproduce
Priority: Major
Fix Version/s: ACM 2.8.0, MCE 2.3.0
Affects Version/s: None
Component/s: HyperShift
Labels:

Blocked:
False
Blocked Reason:
None
Ready:
False
Regression:
No
Test Coverage:

+
Intelligence Requested:
Market:

Severity:
Important

SFDC Cases Links:
SFDC Cases Counter:

Description of problem:

When deploying hypershift on kubevirt, open-cluster-management-agent-addon pods are stuck on "ImagePullBackOff" status

[root@ocp-edge44 ~]# oc get pods -n open-cluster-management-agent-addon
NAME READY STATUS RESTARTS AGE
cluster-proxy-proxy-agent-84f747dcd4-pg6jh 0/2 ImagePullBackOff 0 34h
cluster-proxy-service-proxy-567f74db7d-2b456 0/1 ImagePullBackOff 0 34h
klusterlet-addon-workmgr-5775649c9c-jvgm9 0/1 ImagePullBackOff 0 34h

The reason as appeared in the description of klusterlet-addon-workmgr-5775649c9c-jvgm9 is that we use an unsupported. V2 schema 1 manifest digest.

and we should use the equivalent schema 2 manifest digest instead.

For more information see https://access.redhat.com/articles/6138332

Version-Release number of selected component (if applicable):

[kni@ocp-edge44 ~]$ oc version
Client Version: 4.12.0-0.nightly-2023-04-21-075837
Kustomize Version: v4.5.7
Server Version: 4.12.14
Kubernetes Version: v1.25.8+27e744f

How reproducible:

Happens all the time.

Steps to Reproduce:

the setup I deployed is a hub cluster of 3 master + 3 workers with 150G disk each, and on that, deployed a hosted cluster with 2 workers of 10G memory and 64G disk

Actual results:

[kni@ocp-edge44 ~]$ export KUBECONFIG=clusterconfigs/hyper-1/auth/kubeconfig

[root@ocp-edge44 ~]# oc get pods -n open-cluster-management-agent-addon
NAME READY STATUS RESTARTS AGE
cluster-proxy-proxy-agent-84f747dcd4-pg6jh 0/2 ImagePullBackOff 0 34h
cluster-proxy-service-proxy-567f74db7d-2b456 0/1 ImagePullBackOff 0 34h
klusterlet-addon-workmgr-5775649c9c-jvgm9 0/1 ImagePullBackOff 0 34h

[kni@ocp-edge44 ~]$ oc describe pod -n open-cluster-management-agent-addon
see the event section on one of the pods (3) contains :
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 46m default-scheduler Successfully assigned open-cluster-management-agent-addon/klusterlet-addon-workmgr-5775649c9c-hjs4j to hyper-1-5jk2g
Normal AddedInterface 46m multus Add eth0 [10.133.0.19/23] from ovn-kubernetes
Normal Pulling 44m (x4 over 46m) kubelet Pulling image "registry.redhat.io/multicluster-engine/multicloud-manager-rhel8@sha256:1feec5641907779870542ef40788a225367bae6683d7a4072d0d667d2a61673a"
Warning Failed 44m (x4 over 46m) kubelet Failed to pull image "registry.redhat.io/multicluster-engine/multicloud-manager-rhel8@sha256:1feec5641907779870542ef40788a225367bae6683d7a4072d0d667d2a61673a": rpc error: code = Unknown desc = reading manifest sha256:1feec5641907779870542ef40788a225367bae6683d7a4072d0d667d2a61673a in registry.redhat.io/multicluster-engine/multicloud-manager-rhel8: unsupported: Not Found, or unsupported. V2 schema 1 manifest digest are no longer supported for image pulls. Use the equivalent schema 2 manifest digest instead. For more information see https://access.redhat.com/articles/6138332
Warning Failed 44m (x4 over 46m) kubelet Error: ErrImagePull
Warning Failed 44m (x6 over 46m) kubelet Error: ImagePullBackOff
Normal BackOff 61s (x196 over 46m) kubelet Back-off pulling image "registry.redhat.io/multicluster-engine/multicloud-manager-rhel8@sha256:1feec5641907779870542ef40788a225367bae6683d7a4072d0d667d2a61673a"

Expected results:

[kni@ocp-edge44 ~]$ export KUBECONFIG=clusterconfigs/hyper-1/auth/kubeconfig

[root@ocp-edge44 ~]# oc get pods -n open-cluster-management-agent-addon

the status of all above 3 pods is ok ("running")
[kni@ocp-edge44 ~]$ oc describe pod -n open-cluster-management-agent-addon
see the event section on one of the pods (3) does not contain the above error

Additional info:

for reference, on the hub-cluster all 3 pods are okay with "running" status.
(for deployment, I've used this [job|https://auto-jenkins-csb-kniqe.apps.ocp-c1.prod.psi.redhat.com/job/CI/job/vm-disconnected-ipv4_ctlplane-ipv6_provisioning-snr_nhc/168/parameters/ ] )
covered by test_pods_status

Details

Description

Description of problem:

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

Actual results:

Expected results:

Additional info:

Attachments

Activity

People

Dates