Details
-
Bug
-
Resolution: Not a Bug
-
Undefined
-
None
-
4.10.z
-
False
-
Description
Description of problem:
Attempted upgrade of 3423 SNOs from 4.10.32 to 4.11.5 in large scale ACM/ZTP environment and 9 clusters refused to upgrade because the clusterversion objects were stuck on "Error while reconciling 4.10.32: the cluster operator monitoring has not yet successfully rolled out"
Version-Release number of selected component (if applicable):
SNO OCP 4.10.32 (Clusters with issue) attempting to be upgraded to 4.11.5 Hub OCP 4.11.19 ACM Version - 2.7.0-DOWNSTREAM-2023-01-12-20-55-01
How reproducible:
9 out of 84 failures for upgrade (~11% of the failures) 9 out of 3423 clusters attempted to be upgraded
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info:
# cat platform_nonattempt_monitoring | xargs -I % sh -c "echo -n '% '; oc --kubeconfig=/root/hv-vm/sno/manifests/%/kubeconfig get clusterversion --no-headers" sno00269 version 4.10.32 True False 3d14h Error while reconciling 4.10.32: the cluster operator monitoring has not yet successfully rolled out sno00339 version 4.10.32 True False 3d15h Error while reconciling 4.10.32: the cluster operator monitoring has not yet successfully rolled out sno00585 version 4.10.32 True False 3d13h Error while reconciling 4.10.32: the cluster operator monitoring has not yet successfully rolled out sno00740 version 4.10.32 True False 3d13h Error while reconciling 4.10.32: the cluster operator monitoring has not yet successfully rolled out sno01839 version 4.10.32 True False 3d12h Error while reconciling 4.10.32: the cluster operator monitoring has not yet successfully rolled out sno02881 version 4.10.32 True False 3d9h Error while reconciling 4.10.32: the cluster operator monitoring has not yet successfully rolled out sno02986 version 4.10.32 True False 3d9h Error while reconciling 4.10.32: the cluster operator monitoring has not yet successfully rolled out sno03030 version 4.10.32 True False 3d8h Error while reconciling 4.10.32: the cluster operator monitoring has not yet successfully rolled out sno03053 version 4.10.32 True False 3d8h Error while reconciling 4.10.32: the cluster operator monitoring has not yet successfully rolled out
Describe run on the monitoring operators:
# oc --kubeconfig=/root/hv-vm/sno/manifests/sno00269/kubeconfig describe co monitoring Name: monitoring Namespace: Labels: <none> Annotations: include.release.openshift.io/ibm-cloud-managed: true include.release.openshift.io/self-managed-high-availability: true include.release.openshift.io/single-node-developer: true API Version: config.openshift.io/v1 Kind: ClusterOperator Metadata: Creation Timestamp: 2023-01-14T03:38:25Z Generation: 1 Managed Fields: API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:metadata: f:annotations: .: f:include.release.openshift.io/ibm-cloud-managed: f:include.release.openshift.io/self-managed-high-availability: f:include.release.openshift.io/single-node-developer: f:ownerReferences: .: k:{"uid":"5a53fb27-4659-406c-b8ea-ce6b4ba103cf"}: f:spec: Manager: Go-http-client Operation: Update Time: 2023-01-14T03:38:25Z API Version: config.openshift.io/v1 Fields Type: FieldsV1 fieldsV1: f:status: .: f:conditions: f:extension: f:relatedObjects: f:versions: Manager: Go-http-client Operation: Update Subresource: status Time: 2023-01-14T04:12:26Z Owner References: API Version: config.openshift.io/v1 Kind: ClusterVersion Name: version UID: 5a53fb27-4659-406c-b8ea-ce6b4ba103cf Resource Version: 1376758 UID: 4049a1c8-0c22-4401-8a59-2a6b49ebcc89 Spec: Status: Conditions: Last Transition Time: 2023-01-17T19:13:08Z Message: Rolling out the stack. Reason: RollOutInProgress Status: True Type: Progressing Last Transition Time: 2023-01-14T04:41:39Z Message: Failed to rollout the stack. Error: updating prometheus-k8s: waiting for Prometheus object changes failed: waiting for Prometheus openshift-monitoring/k8s: expected 1 replicas, got 0 updated replicas Reason: UpdatingPrometheusK8SFailed Status: True Type: Degraded Last Transition Time: 2023-01-14T04:12:26Z Status: True Type: Upgradeable Last Transition Time: 2023-01-14T04:41:39Z Message: Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error. Reason: UpdatingPrometheusK8SFailed Status: False Type: Available Extension: <nil> Related Objects: Group: Name: openshift-monitoring Resource: namespaces Group: Name: openshift-user-workload-monitoring Resource: namespaces Group: monitoring.coreos.com Name: Resource: servicemonitors Group: monitoring.coreos.com Name: Resource: podmonitors Group: monitoring.coreos.com Name: Resource: prometheusrules Group: monitoring.coreos.com Name: Resource: alertmanagers Group: monitoring.coreos.com Name: Resource: prometheuses Group: monitoring.coreos.com Name: Resource: thanosrulers Group: monitoring.coreos.com Name: Resource: alertmanagerconfigs Versions: Name: operator Version: 4.10.32 Events: <none>
Pods/Deploys/Statefulsets for each of the affected clusters in then openshift-monitoring namespace
# cat platform_nonattempt_monitoring | xargs -I % sh -c "echo '% '; oc --kubeconfig=/root/hv-vm/sno/manifests/%/kubeconfig get po,deploy,sts -n openshift-monitoring" sno00269 NAME READY STATUS RESTARTS AGE pod/cluster-monitoring-operator-556f6847dd-lsb57 2/2 Running 0 3d15h pod/kube-state-metrics-65f656cd75-h2sff 3/3 Running 0 3d15h pod/node-exporter-dbn7n 2/2 Running 0 3d15h pod/openshift-state-metrics-7bc54ff57d-mwn9t 3/3 Running 0 3d15h pod/prometheus-adapter-56885c749b-cv7cn 0/1 Terminating 0 3d15h pod/prometheus-adapter-5b8f744487-f96tk 1/1 Running 0 2d15h pod/prometheus-k8s-0 5/6 Running 0 3d14h pod/prometheus-operator-5bcc58f4c6-p9gjl 2/2 Running 0 3d15h pod/thanos-querier-654c96c58c-jdp2f 6/6 Running 0 3d14h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/cluster-monitoring-operator 1/1 1 1 3d15h deployment.apps/kube-state-metrics 1/1 1 1 3d15h deployment.apps/openshift-state-metrics 1/1 1 1 3d15h deployment.apps/prometheus-adapter 1/1 1 1 3d15h deployment.apps/prometheus-operator 1/1 1 1 3d15h deployment.apps/thanos-querier 1/1 1 1 3d15h NAME READY AGE statefulset.apps/prometheus-k8s 0/1 3d15h sno00339 NAME READY STATUS RESTARTS AGE pod/cluster-monitoring-operator-556f6847dd-96vfp 2/2 Running 0 3d15h pod/kube-state-metrics-65f656cd75-xtb25 3/3 Running 0 3d15h pod/node-exporter-6qqsc 2/2 Running 0 3d15h pod/openshift-state-metrics-7bc54ff57d-vdgn8 3/3 Running 0 3d15h pod/prometheus-adapter-7fbcfd64cb-k9w9r 1/1 Running 0 2d15h pod/prometheus-k8s-0 5/6 Running 0 3d14h pod/prometheus-operator-5bcc58f4c6-m65d4 2/2 Running 0 3d15h pod/thanos-querier-574d6b9d65-qhq2s 6/6 Running 0 3d14h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/cluster-monitoring-operator 1/1 1 1 3d15h deployment.apps/kube-state-metrics 1/1 1 1 3d15h deployment.apps/openshift-state-metrics 1/1 1 1 3d15h deployment.apps/prometheus-adapter 1/1 1 1 3d15h deployment.apps/prometheus-operator 1/1 1 1 3d15h deployment.apps/thanos-querier 1/1 1 1 3d15h NAME READY AGE statefulset.apps/prometheus-k8s 0/1 3d15h sno00585 NAME READY STATUS RESTARTS AGE pod/cluster-monitoring-operator-556f6847dd-4xdkf 2/2 Running 0 3d14h pod/kube-state-metrics-65f656cd75-cnzg5 3/3 Running 0 3d14h pod/node-exporter-nkjr4 2/2 Running 0 3d14h pod/openshift-state-metrics-7bc54ff57d-4c5mb 3/3 Running 0 3d14h pod/prometheus-adapter-5cbf8c999c-phts8 1/1 Running 0 2d14h pod/prometheus-k8s-0 5/6 Running 0 3d13h pod/prometheus-operator-5bcc58f4c6-rsdm8 2/2 Running 0 3d14h pod/thanos-querier-86bd4c9689-xpwbm 6/6 Running 0 3d13h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/cluster-monitoring-operator 1/1 1 1 3d14h deployment.apps/kube-state-metrics 1/1 1 1 3d14h deployment.apps/openshift-state-metrics 1/1 1 1 3d14h deployment.apps/prometheus-adapter 1/1 1 1 3d14h deployment.apps/prometheus-operator 1/1 1 1 3d14h deployment.apps/thanos-querier 1/1 1 1 3d14h NAME READY AGE statefulset.apps/prometheus-k8s 0/1 3d14h sno00740 NAME READY STATUS RESTARTS AGE pod/cluster-monitoring-operator-556f6847dd-d4d2l 2/2 Running 0 3d14h pod/kube-state-metrics-65f656cd75-n5d4d 3/3 Running 0 3d13h pod/node-exporter-gld9s 2/2 Running 0 3d13h pod/openshift-state-metrics-7bc54ff57d-zq84w 3/3 Running 0 3d13h pod/prometheus-adapter-76685f6975-xrxj4 1/1 Running 0 2d14h pod/prometheus-k8s-0 5/6 Running 0 3d13h pod/prometheus-operator-5bcc58f4c6-svqvl 2/2 Running 0 3d13h pod/thanos-querier-6df9c5d9d8-q9mcx 6/6 Running 0 3d13h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/cluster-monitoring-operator 1/1 1 1 3d14h deployment.apps/kube-state-metrics 1/1 1 1 3d13h deployment.apps/openshift-state-metrics 1/1 1 1 3d13h deployment.apps/prometheus-adapter 1/1 1 1 3d13h deployment.apps/prometheus-operator 1/1 1 1 3d13h deployment.apps/thanos-querier 1/1 1 1 3d13h NAME READY AGE statefulset.apps/prometheus-k8s 0/1 3d13h sno01839 NAME READY STATUS RESTARTS AGE pod/cluster-monitoring-operator-556f6847dd-fg4jn 2/2 Running 0 3d12h pod/kube-state-metrics-65f656cd75-f7ssb 3/3 Running 0 3d12h pod/node-exporter-rqjxb 2/2 Running 0 3d12h pod/openshift-state-metrics-7bc54ff57d-rc4l8 3/3 Running 0 3d12h pod/prometheus-adapter-d69ddd58f-czns5 1/1 Running 0 2d12h pod/prometheus-k8s-0 5/6 Running 0 3d11h pod/prometheus-operator-5bcc58f4c6-vrlgn 2/2 Running 0 3d12h pod/thanos-querier-59d66f4fbc-p2j88 6/6 Running 0 3d11h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/cluster-monitoring-operator 1/1 1 1 3d12h deployment.apps/kube-state-metrics 1/1 1 1 3d12h deployment.apps/openshift-state-metrics 1/1 1 1 3d12h deployment.apps/prometheus-adapter 1/1 1 1 3d12h deployment.apps/prometheus-operator 1/1 1 1 3d12h deployment.apps/thanos-querier 1/1 1 1 3d12h NAME READY AGE statefulset.apps/prometheus-k8s 0/1 3d12h sno02881 NAME READY STATUS RESTARTS AGE pod/cluster-monitoring-operator-556f6847dd-ct7wn 2/2 Running 0 3d10h pod/kube-state-metrics-65f656cd75-v6fwk 3/3 Running 0 3d9h pod/node-exporter-94dfj 2/2 Running 0 3d9h pod/openshift-state-metrics-7bc54ff57d-vflml 3/3 Running 0 3d9h pod/prometheus-adapter-86d8779cf5-2nnjk 1/1 Running 0 2d10h pod/prometheus-k8s-0 5/6 Running 0 3d9h pod/prometheus-operator-5bcc58f4c6-hlxvl 2/2 Running 0 3d9h pod/thanos-querier-5868669ccc-xtgxl 6/6 Running 0 3d9h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/cluster-monitoring-operator 1/1 1 1 3d10h deployment.apps/kube-state-metrics 1/1 1 1 3d9h deployment.apps/openshift-state-metrics 1/1 1 1 3d9h deployment.apps/prometheus-adapter 1/1 1 1 3d9h deployment.apps/prometheus-operator 1/1 1 1 3d9h deployment.apps/thanos-querier 1/1 1 1 3d9h NAME READY AGE statefulset.apps/prometheus-k8s 0/1 3d9h sno02986 NAME READY STATUS RESTARTS AGE pod/cluster-monitoring-operator-556f6847dd-w45s9 2/2 Running 0 3d9h pod/kube-state-metrics-65f656cd75-mc4hf 3/3 Running 0 3d9h pod/node-exporter-trpm9 2/2 Running 0 3d9h pod/openshift-state-metrics-7bc54ff57d-mj2dq 3/3 Running 0 3d9h pod/prometheus-adapter-7d49897cbc-kp5gq 1/1 Running 0 2d10h pod/prometheus-k8s-0 5/6 Running 0 3d8h pod/prometheus-operator-5bcc58f4c6-6qh77 2/2 Running 0 3d9h pod/thanos-querier-975b69457-55rkg 6/6 Running 0 3d8h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/cluster-monitoring-operator 1/1 1 1 3d10h deployment.apps/kube-state-metrics 1/1 1 1 3d9h deployment.apps/openshift-state-metrics 1/1 1 1 3d9h deployment.apps/prometheus-adapter 1/1 1 1 3d9h deployment.apps/prometheus-operator 1/1 1 1 3d9h deployment.apps/thanos-querier 1/1 1 1 3d9h NAME READY AGE statefulset.apps/prometheus-k8s 0/1 3d9h sno03030 NAME READY STATUS RESTARTS AGE pod/cluster-monitoring-operator-556f6847dd-bn94z 2/2 Running 0 3d9h pod/kube-state-metrics-65f656cd75-42lkj 3/3 Running 0 3d8h pod/node-exporter-48xz2 2/2 Running 0 3d8h pod/openshift-state-metrics-7bc54ff57d-q28cm 3/3 Running 0 3d8h pod/prometheus-adapter-5bcbfdc959-wrhgv 1/1 Running 0 2d9h pod/prometheus-k8s-0 5/6 Running 0 3d7h pod/prometheus-operator-5bcc58f4c6-ql2hw 2/2 Running 0 3d8h pod/thanos-querier-69dbf8c49b-mkcf5 6/6 Running 0 3d7h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/cluster-monitoring-operator 1/1 1 1 3d9h deployment.apps/kube-state-metrics 1/1 1 1 3d8h deployment.apps/openshift-state-metrics 1/1 1 1 3d8h deployment.apps/prometheus-adapter 1/1 1 1 3d8h deployment.apps/prometheus-operator 1/1 1 1 3d8h deployment.apps/thanos-querier 1/1 1 1 3d8h NAME READY AGE statefulset.apps/prometheus-k8s 0/1 3d8h sno03053 NAME READY STATUS RESTARTS AGE pod/cluster-monitoring-operator-556f6847dd-gbgxv 2/2 Running 0 3d9h pod/kube-state-metrics-65f656cd75-8k4hc 3/3 Running 0 3d8h pod/node-exporter-wgntq 2/2 Running 0 3d8h pod/openshift-state-metrics-7bc54ff57d-mhfrd 3/3 Running 0 3d8h pod/prometheus-adapter-ffb468559-9kxgh 1/1 Running 0 2d9h pod/prometheus-k8s-0 5/6 Running 0 3d7h pod/prometheus-operator-5bcc58f4c6-b265g 2/2 Running 0 3d8h pod/thanos-querier-84bf9c8645-gv6bz 6/6 Running 0 3d7h NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/cluster-monitoring-operator 1/1 1 1 3d9h deployment.apps/kube-state-metrics 1/1 1 1 3d8h deployment.apps/openshift-state-metrics 1/1 1 1 3d8h deployment.apps/prometheus-adapter 1/1 1 1 3d8h deployment.apps/prometheus-operator 1/1 1 1 3d8h deployment.apps/thanos-querier 1/1 1 1 3d8h NAME READY AGE statefulset.apps/prometheus-k8s 0/1 3d8h