-
Bug
-
Resolution: Done-Errata
-
Normal
-
4.14
-
No
-
Sprint 243
-
1
-
False
-
-
-
Bug Fix
-
Done
-
9/19: telco prioritization pending triage
-
Description of problem:
While installing many SNOs via ZTP using ACM, two SNOs failed to complete install because the image-registry was degraded during the install process. # cat clusters | xargs -I % sh -c "echo '%'; oc --kubeconfig /root/hv-vm/kc/%/kubeconfig get clusterversion" vm01831 NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False False 18h Error while reconciling 4.14.0-rc.0: the cluster operator image-registry is degraded vm02740 NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version False False 18h Error while reconciling 4.14.0-rc.0: the cluster operator image-registry is degraded # cat clusters | xargs -I % sh -c "echo '%'; oc --kubeconfig /root/hv-vm/kc/%/kubeconfig get co image-registry" vm01831 NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE image-registry 4.14.0-rc.0 True False True 18h Degraded: The registry is removed... vm02740 NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE image-registry 4.14.0-rc.0 True False True 18h Degraded: The registry is removed... Both showed the image-pruner job pod in error state: # cat clusters | xargs -I % sh -c "echo '%'; oc --kubeconfig /root/hv-vm/kc/%/kubeconfig get po -n openshift-image-registry" vm01831 NAME READY STATUS RESTARTS AGE cluster-image-registry-operator-5d497944d4-czn64 1/1 Running 0 18h image-pruner-28242720-w6jmv 0/1 Error 0 18h node-ca-vtfj8 1/1 Running 0 18h vm02740 NAME READY STATUS RESTARTS AGE cluster-image-registry-operator-5d497944d4-lbtqw 1/1 Running 1 (18h ago) 18h image-pruner-28242720-ltqzk 0/1 Error 0 18h node-ca-4fntj 1/1 Running 0 18h
Version-Release number of selected component (if applicable):
Deployed SNO OCP - 4.14.0-rc.0 Hub 4.13.11 ACM - 2.9.0-DOWNSTREAM-2023-09-07-04-47-52
How reproducible:
Rare, only 2 clusters were found in this state after the test
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
Seems like some permissions might have been lacking: # oc --kubeconfig /root/hv-vm/kc/vm01831/kubeconfig logs -n openshift-image-registry image-pruner-28242720-w6jmv Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:openshift-image-registry:pruner" cannot list resource "pods" in API group "" at the cluster scope: RBAC: clusterrole.rbac.authorization.k8s.io "system:image-pruner" not found attempt #1 has failed (exit code 1), going to make another attempt... Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:openshift-image-registry:pruner" cannot list resource "pods" in API group "" at the cluster scope: RBAC: clusterrole.rbac.authorization.k8s.io "system:image-pruner" not found attempt #2 has failed (exit code 1), going to make another attempt... Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:openshift-image-registry:pruner" cannot list resource "pods" in API group "" at the cluster scope: RBAC: clusterrole.rbac.authorization.k8s.io "system:image-pruner" not found attempt #3 has failed (exit code 1), going to make another attempt... Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:openshift-image-registry:pruner" cannot list resource "pods" in API group "" at the cluster scope: RBAC: clusterrole.rbac.authorization.k8s.io "system:image-pruner" not found attempt #4 has failed (exit code 1), going to make another attempt... Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:openshift-image-registry:pruner" cannot list resource "pods" in API group "" at the cluster scope: RBAC: clusterrole.rbac.authorization.k8s.io "system:image-pruner" not found attempt #5 has failed (exit code 1), going to make another attempt... Error from server (Forbidden): pods is forbidden: User "system:serviceaccount:openshift-image-registry:pruner" cannot list resource "pods" in API group "" at the cluster scope: RBAC: clusterrole.rbac.authorization.k8s.io "system:image-pruner" not found