-
Bug
-
Resolution: Can't Do
-
Undefined
-
None
-
4.14.z
-
None
-
False
-
-
Description of problem:
While upgrading 3486 SNOs via Image-based upgrades, one cluster failed to initiate an upgrade due to the monitoring operator being degraded. # oc get ibu NAME AGE DESIRED STAGE STATE DETAILS upgrade 5h54m Upgrade InProgress Waiting for system to stabilize before Upgrade (pre-pivot) stage can continue: one or more health checks failed... # oc get co NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE authentication 4.14.31 True False False 4h23m cloud-controller-manager 4.14.31 True False False 2d21h cloud-credential 4.14.31 True False False 2d21h config-operator 4.14.31 True False False 2d21h dns 4.14.31 True False False 2d21h etcd 4.14.31 True False False 2d21h ingress 4.14.31 True False False 2d21h kube-apiserver 4.14.31 True False False 2d21h kube-controller-manager 4.14.31 True False False 2d21h kube-scheduler 4.14.31 True False False 2d21h kube-storage-version-migrator 4.14.31 True False False 2d21h machine-approver 4.14.31 True False False 2d21h machine-config 4.14.31 True False False 2d21h marketplace 4.14.31 True False False 2d21h monitoring 4.14.31 False True True 4h7m reconciling PrometheusAdapter Deployment failed: updating Deployment object failed: waiting for DeploymentRollout of openshift-monitoring/prometheus-adapter: context deadline exceeded network 4.14.31 True False False 2d21h node-tuning 4.14.31 True False False 2d21h openshift-apiserver 4.14.31 True False False 45h openshift-controller-manager 4.14.31 True False False 2d21h operator-lifecycle-manager 4.14.31 True False False 2d21h operator-lifecycle-manager-catalog 4.14.31 True False False 2d21h operator-lifecycle-manager-packageserver 4.14.31 True False False 2d21h service-ca 4.14.31 True False False 2d21h # oc get deploy,rs,po -n openshift-monitoring NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/cluster-monitoring-operator 1/1 1 1 2d21h deployment.apps/kube-state-metrics 1/1 1 1 2d21h deployment.apps/openshift-state-metrics 1/1 1 1 2d21h deployment.apps/prometheus-adapter 0/1 1 0 2d21h deployment.apps/prometheus-operator 1/1 1 1 2d21h deployment.apps/prometheus-operator-admission-webhook 1/1 1 1 2d21h deployment.apps/thanos-querier 1/1 1 1 2d21hNAME DESIRED CURRENT READY AGE replicaset.apps/cluster-monitoring-operator-6f68bbf775 1 1 1 2d21h replicaset.apps/kube-state-metrics-5d5f7556bc 1 1 1 2d21h replicaset.apps/openshift-state-metrics-7b754dcf96 1 1 1 2d21h replicaset.apps/prometheus-adapter-5b44cc7c5c 0 0 0 2d2h replicaset.apps/prometheus-adapter-5cd4796fb5 0 0 0 2d21h replicaset.apps/prometheus-adapter-69897cb996 0 0 0 2d21h replicaset.apps/prometheus-adapter-77cdcbb8d6 1 1 0 45h replicaset.apps/prometheus-adapter-7f845b55b9 0 0 0 2d2h replicaset.apps/prometheus-adapter-865b5c7469 0 0 0 2d2h replicaset.apps/prometheus-operator-64c4867485 1 1 1 2d21h replicaset.apps/prometheus-operator-admission-webhook-7cd56c97cd 1 1 1 2d21h replicaset.apps/thanos-querier-6dd7744cf4 1 1 1 2d21hNAME READY STATUS RESTARTS AGE pod/cluster-monitoring-operator-6f68bbf775-b5bbh 1/1 Running 1 2d21h pod/kube-state-metrics-5d5f7556bc-x2cx4 3/3 Running 3 2d21h pod/node-exporter-nkp9z 2/2 Running 2 2d21h pod/openshift-state-metrics-7b754dcf96-lgq95 3/3 Running 3 2d21h pod/prometheus-adapter-77cdcbb8d6-ml2r9 0/1 Running 2 (4h24m ago) 45h pod/prometheus-k8s-0 6/6 Running 6 2d20h pod/prometheus-operator-64c4867485-kb72d 2/2 Running 2 2d21h pod/prometheus-operator-admission-webhook-7cd56c97cd-475xj 1/1 Running 3 (4h23m ago) 2d21h pod/thanos-querier-6dd7744cf4-r9r74 6/6 Running 6 2d21h
Version-Release number of selected component (if applicable):
Hub - 4.16.3 Deployed SNOs - 4.14.31 ACM - 2.11.0-DOWNSTREAM-2024-07-10-21-49-48 TALM 4.16.0 LCA - 4.16.0
How reproducible:
Rare - 1 out of 3486 SNOs produced this issue
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info: