-
Bug
-
Resolution: Done
-
Critical
-
4.11.z
-
None
-
3
-
OCP VE Sprint 228, OCP VE Sprint 229, OCP VE Sprint 230
-
3
-
Rejected
-
False
-
-
-
Description of problem:
Installing 1000+ SNOs via ACM/MCE via ZTP with gitops, a small percentage of clusters end up never completing install because the monitoring operator does not reconcile to available.
# oc --kubeconfig=/root/hv-vm/sno/manifests/sno01219/kubeconfig get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version False True 16h Unable to apply 4.11.0: the cluster operator monitoring has not yet successfully rolled out
# oc --kubeconfig=/root/hv-vm/sno/manifests/sno01219/kubeconfig get co monitoring
NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE MESSAGE
monitoring False True True 15h Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error.
Version-Release number of selected component (if applicable):
- Hub OCP and SNO OCP - 4.11.0
- ACM - 2.6.0-DOWNSTREAM-2022-08-11-23-41-09 (FC5)
How reproducible:
- 2 out of 23 failures out of 1728 installs
- ~8% of the failures are because of this issue
- failure rate of ~.1% of the total installs
Additional info:
# oc --kubeconfig=/root/hv-vm/sno/manifests/sno01219/kubeconfig get po -n openshift-monitoring NAME READY STATUS RESTARTS AGE alertmanager-main-0 0/6 ContainerCreating 0 15h cluster-monitoring-operator-54dd78cc74-l5w24 2/2 Running 0 15h kube-state-metrics-b6455c4dc-8hcfn 3/3 Running 0 15h node-exporter-k7899 2/2 Running 0 15h openshift-state-metrics-7984888fbd-cl67v 3/3 Running 0 15h prometheus-adapter-785bf4f975-wgmnh 1/1 Running 0 15h prometheus-k8s-0 0/6 Init:0/1 0 15h prometheus-operator-74d8754ff7-9zrgw 2/2 Running 0 15h prometheus-operator-admission-webhook-6665fb687d-c5jgv 1/1 Running 0 15h thanos-querier-575496c665-jcc8l 6/6 Running 0 15h # oc --kubeconfig=/root/hv-vm/sno/manifests/sno01219/kubeconfig describe po -n openshift-monitoring alertmanager-main-0 Name: alertmanager-main-0 Namespace: openshift-monitoring Priority: 2000000000 Priority Class Name: system-cluster-critical Node: sno01219/fc00:1001::8aa Start Time: Mon, 15 Aug 2022 23:53:39 +0000 Labels: alertmanager=main app.kubernetes.io/component=alert-router app.kubernetes.io/instance=main app.kubernetes.io/managed-by=prometheus-operator app.kubernetes.io/name=alertmanager app.kubernetes.io/part-of=openshift-monitoring app.kubernetes.io/version=0.24.0 controller-revision-hash=alertmanager-main-fcf8dd5fb statefulset.kubernetes.io/pod-name=alertmanager-main-0 Annotations: kubectl.kubernetes.io/default-container: alertmanager openshift.io/scc: nonroot Status: Pending IP: IPs: <none> Controlled By: StatefulSet/alertmanager-main Containers: alertmanager: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:91308d35c1e56463f55c1aaa519ff4de7335d43b254c21abdb845fc8c72821a1 Image ID: Ports: 9094/TCP, 9094/UDP Host Ports: 0/TCP, 0/UDP Args: --config.file=/etc/alertmanager/config/alertmanager.yaml --storage.path=/alertmanager --data.retention=120h --cluster.listen-address= --web.listen-address=127.0.0.1:9093 --web.external-url=https:/console-openshift-console.apps.sno01219.rdu2.scalelab.redhat.com/monitoring --web.route-prefix=/ --cluster.peer=alertmanager-main-0.alertmanager-operated:9094 --cluster.reconnect-timeout=5m State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Requests: cpu: 4m memory: 40Mi Environment: POD_IP: (v1:status.podIP) Mounts: /alertmanager from alertmanager-main-db (rw) /etc/alertmanager/certs from tls-assets (ro) /etc/alertmanager/config from config-volume (rw) /etc/alertmanager/secrets/alertmanager-kube-rbac-proxy from secret-alertmanager-kube-rbac-proxy (ro) /etc/alertmanager/secrets/alertmanager-kube-rbac-proxy-metric from secret-alertmanager-kube-rbac-proxy-metric (ro) /etc/alertmanager/secrets/alertmanager-main-proxy from secret-alertmanager-main-proxy (ro) /etc/alertmanager/secrets/alertmanager-main-tls from secret-alertmanager-main-tls (ro) /etc/pki/ca-trust/extracted/pem/ from alertmanager-trusted-ca-bundle (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hl77l (ro) config-reloader: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:209e20410ec2d3d7a502f568d2b7fe1cd1beadcb36fff2d1e6f59d77be3200e3 Image ID: Port: <none> Host Port: <none> Command: /bin/prometheus-config-reloader Args: --listen-address=localhost:8080 --reload-url=http://localhost:9093/-/reload --watched-dir=/etc/alertmanager/config --watched-dir=/etc/alertmanager/secrets/alertmanager-main-tls --watched-dir=/etc/alertmanager/secrets/alertmanager-main-proxy --watched-dir=/etc/alertmanager/secrets/alertmanager-kube-rbac-proxy --watched-dir=/etc/alertmanager/secrets/alertmanager-kube-rbac-proxy-metric State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Requests: cpu: 1m memory: 10Mi Environment: POD_NAME: alertmanager-main-0 (v1:metadata.name) SHARD: -1 Mounts: /etc/alertmanager/config from config-volume (ro) /etc/alertmanager/secrets/alertmanager-kube-rbac-proxy from secret-alertmanager-kube-rbac-proxy (ro) /etc/alertmanager/secrets/alertmanager-kube-rbac-proxy-metric from secret-alertmanager-kube-rbac-proxy-metric (ro) /etc/alertmanager/secrets/alertmanager-main-proxy from secret-alertmanager-main-proxy (ro) /etc/alertmanager/secrets/alertmanager-main-tls from secret-alertmanager-main-tls (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hl77l (ro) alertmanager-proxy: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:140f8947593d92e1517e50a201e83bdef8eb965b552a21d3caf346a250d0cf6e Image ID: Port: 9095/TCP Host Port: 0/TCP Args: -provider=openshift -https-address=:9095 -http-address= -email-domain=* -upstream=http://localhost:9093 -openshift-sar=[{"resource": "namespaces", "verb": "get"}, {"resource": "alertmanagers", "resourceAPIGroup": "monitoring.coreos.com", "namespace": "openshift-monitoring", "verb": "patch", "resourceName": "non-existant"}] -openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}, "/": {"resource":"alertmanagers", "group": "monitoring.coreos.com", "namespace": "openshift-monitoring", "verb": "patch", "name": "non-existant"}} -tls-cert=/etc/tls/private/tls.crt -tls-key=/etc/tls/private/tls.key -client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token -cookie-secret-file=/etc/proxy/secrets/session_secret -openshift-service-account=alertmanager-main -openshift-ca=/etc/pki/tls/cert.pem -openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Requests: cpu: 1m memory: 20Mi Environment: HTTP_PROXY: HTTPS_PROXY: NO_PROXY: Mounts: /etc/pki/ca-trust/extracted/pem/ from alertmanager-trusted-ca-bundle (ro) /etc/proxy/secrets from secret-alertmanager-main-proxy (rw) /etc/tls/private from secret-alertmanager-main-tls (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hl77l (ro) kube-rbac-proxy: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b5e1c69d005727e3245604cfca7a63e4f9bc6e15128c7489e41d5e967305089e Image ID: Port: 9092/TCP Host Port: 0/TCP Args: --secure-listen-address=0.0.0.0:9092 --upstream=http://127.0.0.1:9096 --config-file=/etc/kube-rbac-proxy/config.yaml --tls-cert-file=/etc/tls/private/tls.crt --tls-private-key-file=/etc/tls/private/tls.key --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 --logtostderr=true --tls-min-version=VersionTLS12 State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Requests: cpu: 1m memory: 15Mi Environment: <none> Mounts: /etc/kube-rbac-proxy from secret-alertmanager-kube-rbac-proxy (rw) /etc/tls/private from secret-alertmanager-main-tls (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hl77l (ro) kube-rbac-proxy-metric: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b5e1c69d005727e3245604cfca7a63e4f9bc6e15128c7489e41d5e967305089e Image ID: Port: 9097/TCP Host Port: 0/TCP Args: --secure-listen-address=0.0.0.0:9097 --upstream=http://127.0.0.1:9093 --config-file=/etc/kube-rbac-proxy/config.yaml --tls-cert-file=/etc/tls/private/tls.crt --tls-private-key-file=/etc/tls/private/tls.key --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 --client-ca-file=/etc/tls/client/client-ca.crt --logtostderr=true --allow-paths=/metrics --tls-min-version=VersionTLS12 State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Requests: cpu: 1m memory: 15Mi Environment: <none> Mounts: /etc/kube-rbac-proxy from secret-alertmanager-kube-rbac-proxy-metric (ro) /etc/tls/client from metrics-client-ca (ro) /etc/tls/private from secret-alertmanager-main-tls (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hl77l (ro) prom-label-proxy: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2550b2cbdf864515b1edacf43c25eb6b6f179713c1df34e51f6e9bba48d6430a Image ID: Port: <none> Host Port: <none> Args: --insecure-listen-address=127.0.0.1:9096 --upstream=http://127.0.0.1:9093 --label=namespace --error-on-replace State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Requests: cpu: 1m memory: 20Mi Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hl77l (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: config-volume: Type: Secret (a volume populated by a Secret) SecretName: alertmanager-main-generated Optional: false tls-assets: Type: Projected (a volume that contains injected data from multiple sources) SecretName: alertmanager-main-tls-assets-0 SecretOptionalName: <nil> secret-alertmanager-main-tls: Type: Secret (a volume populated by a Secret) SecretName: alertmanager-main-tls Optional: false secret-alertmanager-main-proxy: Type: Secret (a volume populated by a Secret) SecretName: alertmanager-main-proxy Optional: false secret-alertmanager-kube-rbac-proxy: Type: Secret (a volume populated by a Secret) SecretName: alertmanager-kube-rbac-proxy Optional: false secret-alertmanager-kube-rbac-proxy-metric: Type: Secret (a volume populated by a Secret) SecretName: alertmanager-kube-rbac-proxy-metric Optional: false alertmanager-main-db: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> metrics-client-ca: Type: ConfigMap (a volume populated by a ConfigMap) Name: metrics-client-ca Optional: false alertmanager-trusted-ca-bundle: Type: ConfigMap (a volume populated by a ConfigMap) Name: alertmanager-trusted-ca-bundle-2rsonso43rc5p Optional: true kube-api-access-hl77l: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedCreatePodSandBox 2m25s (x409 over 15h) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_alertmanager-main-0_openshift-monitoring_1c367a83-24e3-4249-861a-a107a6beaee2_0(dff5f302f774d060728261b3c86841ebdbd7ba11537ec9f4d90d57be17bdf44b): error adding pod openshift-monitoring_alertmanager-main-0 to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [openshift-monitoring/alertmanager-main-0/1c367a83-24e3-4249-861a-a107a6beaee2:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-monitoring/alertmanager-main-0 dff5f302f774d060728261b3c86841ebdbd7ba11537ec9f4d90d57be17bdf44b] [openshift-monitoring/alertmanager-main-0 dff5f302f774d060728261b3c86841ebdbd7ba11537ec9f4d90d57be17bdf44b] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded oc --kubeconfig=/root/hv-vm/sno/manifests/sno01219/kubeconfig describe po -n openshift-monitoring prometheus-k8s-0 Name: prometheus-k8s-0 Namespace: openshift-monitoring Priority: 2000000000 Priority Class Name: system-cluster-critical Node: sno01219/fc00:1001::8aa Start Time: Mon, 15 Aug 2022 23:53:39 +0000 Labels: app.kubernetes.io/component=prometheus app.kubernetes.io/instance=k8s app.kubernetes.io/managed-by=prometheus-operator app.kubernetes.io/name=prometheus app.kubernetes.io/part-of=openshift-monitoring app.kubernetes.io/version=2.36.2 controller-revision-hash=prometheus-k8s-546b544f8b operator.prometheus.io/name=k8s operator.prometheus.io/shard=0 prometheus=k8s statefulset.kubernetes.io/pod-name=prometheus-k8s-0 Annotations: kubectl.kubernetes.io/default-container: prometheus openshift.io/scc: nonroot Status: Pending IP: IPs: <none> Controlled By: StatefulSet/prometheus-k8s Init Containers: init-config-reloader: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:209e20410ec2d3d7a502f568d2b7fe1cd1beadcb36fff2d1e6f59d77be3200e3 Image ID: Port: 8080/TCP Host Port: 0/TCP Command: /bin/prometheus-config-reloader Args: --watch-interval=0 --listen-address=:8080 --config-file=/etc/prometheus/config/prometheus.yaml.gz --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml --watched-dir=/etc/prometheus/rules/prometheus-k8s-rulefiles-0 State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Requests: cpu: 1m memory: 10Mi Environment: POD_NAME: prometheus-k8s-0 (v1:metadata.name) SHARD: 0 Mounts: /etc/prometheus/config from config (rw) /etc/prometheus/config_out from config-out (rw) /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-85zlc (ro) Containers: prometheus: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c7df53b796e81ba8301ba74d02317226329bd5752fd31c1b44d028e4832f21c3 Image ID: Port: <none> Host Port: <none> Args: --web.console.templates=/etc/prometheus/consoles --web.console.libraries=/etc/prometheus/console_libraries --storage.tsdb.retention.time=15d --config.file=/etc/prometheus/config_out/prometheus.env.yaml --storage.tsdb.path=/prometheus --web.enable-lifecycle --web.external-url=https:/console-openshift-console.apps.sno01219.rdu2.scalelab.redhat.com/monitoring --web.route-prefix=/ --web.listen-address=127.0.0.1:9090 --web.config.file=/etc/prometheus/web_config/web-config.yaml State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Requests: cpu: 70m memory: 1Gi Liveness: exec [sh -c if [ -x "$(command -v curl)" ]; then exec curl --fail http://localhost:9090/-/healthy; elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/healthy; else exit 1; fi] delay=0s timeout=3s period=5s #success=1 #failure=6 Readiness: exec [sh -c if [ -x "$(command -v curl)" ]; then exec curl --fail http://localhost:9090/-/ready; elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/ready; else exit 1; fi] delay=0s timeout=3s period=5s #success=1 #failure=3 Startup: exec [sh -c if [ -x "$(command -v curl)" ]; then exec curl --fail http://localhost:9090/-/ready; elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/ready; else exit 1; fi] delay=0s timeout=3s period=15s #success=1 #failure=60 Environment: <none> Mounts: /etc/pki/ca-trust/extracted/pem/ from prometheus-trusted-ca-bundle (ro) /etc/prometheus/certs from tls-assets (ro) /etc/prometheus/config_out from config-out (ro) /etc/prometheus/configmaps/kubelet-serving-ca-bundle from configmap-kubelet-serving-ca-bundle (ro) /etc/prometheus/configmaps/metrics-client-ca from configmap-metrics-client-ca (ro) /etc/prometheus/configmaps/serving-certs-ca-bundle from configmap-serving-certs-ca-bundle (ro) /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw) /etc/prometheus/secrets/kube-etcd-client-certs from secret-kube-etcd-client-certs (ro) /etc/prometheus/secrets/kube-rbac-proxy from secret-kube-rbac-proxy (ro) /etc/prometheus/secrets/metrics-client-certs from secret-metrics-client-certs (ro) /etc/prometheus/secrets/prometheus-k8s-proxy from secret-prometheus-k8s-proxy (ro) /etc/prometheus/secrets/prometheus-k8s-thanos-sidecar-tls from secret-prometheus-k8s-thanos-sidecar-tls (ro) /etc/prometheus/secrets/prometheus-k8s-tls from secret-prometheus-k8s-tls (ro) /etc/prometheus/web_config/web-config.yaml from web-config (ro,path="web-config.yaml") /prometheus from prometheus-k8s-db (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-85zlc (ro) config-reloader: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:209e20410ec2d3d7a502f568d2b7fe1cd1beadcb36fff2d1e6f59d77be3200e3 Image ID: Port: <none> Host Port: <none> Command: /bin/prometheus-config-reloader Args: --listen-address=localhost:8080 --reload-url=http://localhost:9090/-/reload --config-file=/etc/prometheus/config/prometheus.yaml.gz --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml --watched-dir=/etc/prometheus/rules/prometheus-k8s-rulefiles-0 State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Requests: cpu: 1m memory: 10Mi Environment: POD_NAME: prometheus-k8s-0 (v1:metadata.name) SHARD: 0 Mounts: /etc/prometheus/config from config (rw) /etc/prometheus/config_out from config-out (rw) /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-85zlc (ro) thanos-sidecar: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:36fc214537c763b3a3f0a9dc7a1bd4378a80428c31b2629df8786a9b09155e6d Image ID: Ports: 10902/TCP, 10901/TCP Host Ports: 0/TCP, 0/TCP Args: sidecar --prometheus.url=http://localhost:9090/ --tsdb.path=/prometheus --http-address=127.0.0.1:10902 --grpc-server-tls-cert=/etc/tls/grpc/server.crt --grpc-server-tls-key=/etc/tls/grpc/server.key --grpc-server-tls-client-ca=/etc/tls/grpc/ca.crt State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Requests: cpu: 1m memory: 25Mi Environment: <none> Mounts: /etc/tls/grpc from secret-grpc-tls (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-85zlc (ro) prometheus-proxy: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:140f8947593d92e1517e50a201e83bdef8eb965b552a21d3caf346a250d0cf6e Image ID: Port: 9091/TCP Host Port: 0/TCP Args: -provider=openshift -https-address=:9091 -http-address= -email-domain=* -upstream=http://localhost:9090 -openshift-service-account=prometheus-k8s -openshift-sar={"resource": "namespaces", "verb": "get"} -openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}} -tls-cert=/etc/tls/private/tls.crt -tls-key=/etc/tls/private/tls.key -client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token -cookie-secret-file=/etc/proxy/secrets/session_secret -openshift-ca=/etc/pki/tls/cert.pem -openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Requests: cpu: 1m memory: 20Mi Environment: HTTP_PROXY: HTTPS_PROXY: NO_PROXY: Mounts: /etc/pki/ca-trust/extracted/pem/ from prometheus-trusted-ca-bundle (ro) /etc/proxy/secrets from secret-prometheus-k8s-proxy (rw) /etc/tls/private from secret-prometheus-k8s-tls (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-85zlc (ro) kube-rbac-proxy: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b5e1c69d005727e3245604cfca7a63e4f9bc6e15128c7489e41d5e967305089e Image ID: Port: 9092/TCP Host Port: 0/TCP Args: --secure-listen-address=0.0.0.0:9092 --upstream=http://127.0.0.1:9090 --allow-paths=/metrics --config-file=/etc/kube-rbac-proxy/config.yaml --tls-cert-file=/etc/tls/private/tls.crt --tls-private-key-file=/etc/tls/private/tls.key --client-ca-file=/etc/tls/client/client-ca.crt --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 --logtostderr=true --tls-min-version=VersionTLS12 State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Requests: cpu: 1m memory: 15Mi Environment: <none> Mounts: /etc/kube-rbac-proxy from secret-kube-rbac-proxy (rw) /etc/tls/client from configmap-metrics-client-ca (ro) /etc/tls/private from secret-prometheus-k8s-tls (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-85zlc (ro) kube-rbac-proxy-thanos: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b5e1c69d005727e3245604cfca7a63e4f9bc6e15128c7489e41d5e967305089e Image ID: Port: 10902/TCP Host Port: 0/TCP Args: --secure-listen-address=[$(POD_IP)]:10902 --upstream=http://127.0.0.1:10902 --tls-cert-file=/etc/tls/private/tls.crt --tls-private-key-file=/etc/tls/private/tls.key --client-ca-file=/etc/tls/client/client-ca.crt --config-file=/etc/kube-rbac-proxy/config.yaml --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 --allow-paths=/metrics --logtostderr=true --tls-min-version=VersionTLS12 --client-ca-file=/etc/tls/client/client-ca.crt State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Requests: cpu: 1m memory: 10Mi Environment: POD_IP: (v1:status.podIP) Mounts: /etc/kube-rbac-proxy from secret-kube-rbac-proxy (rw) /etc/tls/client from metrics-client-ca (ro) /etc/tls/private from secret-prometheus-k8s-thanos-sidecar-tls (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-85zlc (ro) Conditions: Type Status Initialized False Ready False ContainersReady False PodScheduled True Volumes: config: Type: Secret (a volume populated by a Secret) SecretName: prometheus-k8s Optional: false tls-assets: Type: Projected (a volume that contains injected data from multiple sources) SecretName: prometheus-k8s-tls-assets-0 SecretOptionalName: <nil> config-out: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> prometheus-k8s-rulefiles-0: Type: ConfigMap (a volume populated by a ConfigMap) Name: prometheus-k8s-rulefiles-0 Optional: false web-config: Type: Secret (a volume populated by a Secret) SecretName: prometheus-k8s-web-config Optional: false secret-kube-etcd-client-certs: Type: Secret (a volume populated by a Secret) SecretName: kube-etcd-client-certs Optional: false secret-prometheus-k8s-tls: Type: Secret (a volume populated by a Secret) SecretName: prometheus-k8s-tls Optional: false secret-prometheus-k8s-proxy: Type: Secret (a volume populated by a Secret) SecretName: prometheus-k8s-proxy Optional: false secret-prometheus-k8s-thanos-sidecar-tls: Type: Secret (a volume populated by a Secret) SecretName: prometheus-k8s-thanos-sidecar-tls Optional: false secret-kube-rbac-proxy: Type: Secret (a volume populated by a Secret) SecretName: kube-rbac-proxy Optional: false secret-metrics-client-certs: Type: Secret (a volume populated by a Secret) SecretName: metrics-client-certs Optional: false configmap-serving-certs-ca-bundle: Type: ConfigMap (a volume populated by a ConfigMap) Name: serving-certs-ca-bundle Optional: false configmap-kubelet-serving-ca-bundle: Type: ConfigMap (a volume populated by a ConfigMap) Name: kubelet-serving-ca-bundle Optional: false configmap-metrics-client-ca: Type: ConfigMap (a volume populated by a ConfigMap) Name: metrics-client-ca Optional: false prometheus-k8s-db: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: SizeLimit: <unset> metrics-client-ca: Type: ConfigMap (a volume populated by a ConfigMap) Name: metrics-client-ca Optional: false secret-grpc-tls: Type: Secret (a volume populated by a Secret) SecretName: prometheus-k8s-grpc-tls-crdkohb1gb92n Optional: false prometheus-trusted-ca-bundle: Type: ConfigMap (a volume populated by a ConfigMap) Name: prometheus-trusted-ca-bundle-2rsonso43rc5p Optional: true kube-api-access-85zlc: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: Burstable Node-Selectors: kubernetes.io/os=linux Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedCreatePodSandBox 4m19s (x409 over 15h) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-k8s-0_openshift-monitoring_debda4d2-6914-4b36-92e0-78f68d539ab3_0(86af91d4e64ab0fbad95352b029762e9856ff24005445b458bccb22e0ee9b655): error adding pod openshift-monitoring_prometheus-k8s-0 to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [openshift-monitoring/prometheus-k8s-0/debda4d2-6914-4b36-92e0-78f68d539ab3:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-monitoring/prometheus-k8s-0 86af91d4e64ab0fbad95352b029762e9856ff24005445b458bccb22e0ee9b655] [openshift-monitoring/prometheus-k8s-0 86af91d4e64ab0fbad95352b029762e9856ff24005445b458bccb22e0ee9b655] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
Both pods in error state seem to be waiting on this issue "failed to get pod annotation: timed out waiting for annotations: context deadline exceeded"
- clones
-
OCPBUGS-166 4.11 SNOs fail to complete install because of "failed to get pod annotation: timed out waiting for annotations: context deadline exceeded"
- Closed
- depends on
-
OCPBUGS-166 4.11 SNOs fail to complete install because of "failed to get pod annotation: timed out waiting for annotations: context deadline exceeded"
- Closed
- is depended on by
-
OCPBUGS-12819 Installation failure due to Alertmanager failing to join CNI network
- Closed
- links to