Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-166

4.11 SNOs fail to complete install because of "failed to get pod annotation: timed out waiting for annotations: context deadline exceeded"

    XMLWordPrintable

Details

    • 3
    • OCP VE Sprint 226, OCP VE Sprint 227, OCP VE Sprint 228, OCP VE Sprint 229, OCP VE Sprint 230
    • 5
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      11/10: removed from Telco-Grade OCP 4.12 gating list, tracking OCPBUGS-3390 instead
      11/7: Green as the upstream has merged and downstream merge is in progress.
      Show
      11/10: removed from Telco-Grade OCP 4.12 gating list, tracking OCPBUGS-3390 instead 11/7: Green as the upstream has merged and downstream merge is in progress.

    Description

      Description of problem:

      Installing 1000+ SNOs via ACM/MCE via ZTP with gitops, a small percentage of clusters end up never completing install because the monitoring operator does not reconcile to available.

      # oc --kubeconfig=/root/hv-vm/sno/manifests/sno01219/kubeconfig get clusterversion
      NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
      version             False       True          16h     Unable to apply 4.11.0: the cluster operator monitoring has not yet successfully rolled out
      # oc --kubeconfig=/root/hv-vm/sno/manifests/sno01219/kubeconfig get co monitoring
      NAME         VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      monitoring             False       True          True       15h     Rollout of the monitoring stack failed and is degraded. Please investigate the degraded status error. 

       

      Version-Release number of selected component (if applicable):

      • Hub OCP and SNO OCP - 4.11.0
      • ACM - 2.6.0-DOWNSTREAM-2022-08-11-23-41-09  (FC5)

       

      How reproducible:

      • 2 out of 23 failures out of 1728 installs
      • ~8% of the failures are because of this issue
      • failure rate of ~.1% of the total installs

       

      Additional info:

       

      # oc --kubeconfig=/root/hv-vm/sno/manifests/sno01219/kubeconfig get po -n openshift-monitoring
      NAME                                                     READY   STATUS              RESTARTS   AGE
      alertmanager-main-0                                      0/6     ContainerCreating   0          15h
      cluster-monitoring-operator-54dd78cc74-l5w24             2/2     Running             0          15h
      kube-state-metrics-b6455c4dc-8hcfn                       3/3     Running             0          15h
      node-exporter-k7899                                      2/2     Running             0          15h
      openshift-state-metrics-7984888fbd-cl67v                 3/3     Running             0          15h
      prometheus-adapter-785bf4f975-wgmnh                      1/1     Running             0          15h
      prometheus-k8s-0                                         0/6     Init:0/1            0          15h
      prometheus-operator-74d8754ff7-9zrgw                     2/2     Running             0          15h
      prometheus-operator-admission-webhook-6665fb687d-c5jgv   1/1     Running             0          15h
      thanos-querier-575496c665-jcc8l                          6/6     Running             0          15h 
      # oc --kubeconfig=/root/hv-vm/sno/manifests/sno01219/kubeconfig describe po -n openshift-monitoring alertmanager-main-0
      Name:                 alertmanager-main-0
      Namespace:            openshift-monitoring
      Priority:             2000000000
      Priority Class Name:  system-cluster-critical
      Node:                 sno01219/fc00:1001::8aa
      Start Time:           Mon, 15 Aug 2022 23:53:39 +0000
      Labels:               alertmanager=main
                            app.kubernetes.io/component=alert-router
                            app.kubernetes.io/instance=main
                            app.kubernetes.io/managed-by=prometheus-operator
                            app.kubernetes.io/name=alertmanager
                            app.kubernetes.io/part-of=openshift-monitoring
                            app.kubernetes.io/version=0.24.0
                            controller-revision-hash=alertmanager-main-fcf8dd5fb
                            statefulset.kubernetes.io/pod-name=alertmanager-main-0
      Annotations:          kubectl.kubernetes.io/default-container: alertmanager
                            openshift.io/scc: nonroot
      Status:               Pending
      IP:
      IPs:                  <none>
      Controlled By:        StatefulSet/alertmanager-main
      Containers:
        alertmanager:
          Container ID:
          Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:91308d35c1e56463f55c1aaa519ff4de7335d43b254c21abdb845fc8c72821a1
          Image ID:
          Ports:         9094/TCP, 9094/UDP
          Host Ports:    0/TCP, 0/UDP
          Args:
            --config.file=/etc/alertmanager/config/alertmanager.yaml
            --storage.path=/alertmanager
            --data.retention=120h
            --cluster.listen-address=
            --web.listen-address=127.0.0.1:9093
            --web.external-url=https:/console-openshift-console.apps.sno01219.rdu2.scalelab.redhat.com/monitoring
            --web.route-prefix=/
            --cluster.peer=alertmanager-main-0.alertmanager-operated:9094
            --cluster.reconnect-timeout=5m
          State:          Waiting
            Reason:       ContainerCreating
          Ready:          False
          Restart Count:  0
          Requests:
            cpu:     4m
            memory:  40Mi
          Environment:
            POD_IP:   (v1:status.podIP)
          Mounts:
            /alertmanager from alertmanager-main-db (rw)
            /etc/alertmanager/certs from tls-assets (ro)
            /etc/alertmanager/config from config-volume (rw)
            /etc/alertmanager/secrets/alertmanager-kube-rbac-proxy from secret-alertmanager-kube-rbac-proxy (ro)
            /etc/alertmanager/secrets/alertmanager-kube-rbac-proxy-metric from secret-alertmanager-kube-rbac-proxy-metric (ro)
            /etc/alertmanager/secrets/alertmanager-main-proxy from secret-alertmanager-main-proxy (ro)
            /etc/alertmanager/secrets/alertmanager-main-tls from secret-alertmanager-main-tls (ro)
            /etc/pki/ca-trust/extracted/pem/ from alertmanager-trusted-ca-bundle (ro)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hl77l (ro)
        config-reloader:
          Container ID:
          Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:209e20410ec2d3d7a502f568d2b7fe1cd1beadcb36fff2d1e6f59d77be3200e3
          Image ID:
          Port:          <none>
          Host Port:     <none>
          Command:
            /bin/prometheus-config-reloader
          Args:
            --listen-address=localhost:8080
            --reload-url=http://localhost:9093/-/reload
            --watched-dir=/etc/alertmanager/config
            --watched-dir=/etc/alertmanager/secrets/alertmanager-main-tls
            --watched-dir=/etc/alertmanager/secrets/alertmanager-main-proxy
            --watched-dir=/etc/alertmanager/secrets/alertmanager-kube-rbac-proxy
            --watched-dir=/etc/alertmanager/secrets/alertmanager-kube-rbac-proxy-metric
          State:          Waiting
            Reason:       ContainerCreating
          Ready:          False
          Restart Count:  0
          Requests:
            cpu:     1m
            memory:  10Mi
          Environment:
            POD_NAME:  alertmanager-main-0 (v1:metadata.name)
            SHARD:     -1
          Mounts:
            /etc/alertmanager/config from config-volume (ro)
            /etc/alertmanager/secrets/alertmanager-kube-rbac-proxy from secret-alertmanager-kube-rbac-proxy (ro)
            /etc/alertmanager/secrets/alertmanager-kube-rbac-proxy-metric from secret-alertmanager-kube-rbac-proxy-metric (ro)
            /etc/alertmanager/secrets/alertmanager-main-proxy from secret-alertmanager-main-proxy (ro)
            /etc/alertmanager/secrets/alertmanager-main-tls from secret-alertmanager-main-tls (ro)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hl77l (ro)
        alertmanager-proxy:
          Container ID:
          Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:140f8947593d92e1517e50a201e83bdef8eb965b552a21d3caf346a250d0cf6e
          Image ID:
          Port:          9095/TCP
          Host Port:     0/TCP
          Args:
            -provider=openshift
            -https-address=:9095
            -http-address=
            -email-domain=*
            -upstream=http://localhost:9093
            -openshift-sar=[{"resource": "namespaces", "verb": "get"}, {"resource": "alertmanagers", "resourceAPIGroup": "monitoring.coreos.com", "namespace": "openshift-monitoring", "verb": "patch", "resourceName": "non-existant"}]
            -openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}, "/": {"resource":"alertmanagers", "group": "monitoring.coreos.com", "namespace": "openshift-monitoring", "verb": "patch", "name": "non-existant"}}
            -tls-cert=/etc/tls/private/tls.crt
            -tls-key=/etc/tls/private/tls.key
            -client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token
            -cookie-secret-file=/etc/proxy/secrets/session_secret
            -openshift-service-account=alertmanager-main
            -openshift-ca=/etc/pki/tls/cert.pem
            -openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          State:          Waiting
            Reason:       ContainerCreating
          Ready:          False
          Restart Count:  0
          Requests:
            cpu:     1m
            memory:  20Mi
          Environment:
            HTTP_PROXY:
            HTTPS_PROXY:
            NO_PROXY:
          Mounts:
            /etc/pki/ca-trust/extracted/pem/ from alertmanager-trusted-ca-bundle (ro)
            /etc/proxy/secrets from secret-alertmanager-main-proxy (rw)
            /etc/tls/private from secret-alertmanager-main-tls (rw)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hl77l (ro)
        kube-rbac-proxy:
          Container ID:
          Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b5e1c69d005727e3245604cfca7a63e4f9bc6e15128c7489e41d5e967305089e
          Image ID:
          Port:          9092/TCP
          Host Port:     0/TCP
          Args:
            --secure-listen-address=0.0.0.0:9092
            --upstream=http://127.0.0.1:9096
            --config-file=/etc/kube-rbac-proxy/config.yaml
            --tls-cert-file=/etc/tls/private/tls.crt
            --tls-private-key-file=/etc/tls/private/tls.key
            --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
            --logtostderr=true
            --tls-min-version=VersionTLS12
          State:          Waiting
            Reason:       ContainerCreating
          Ready:          False
          Restart Count:  0
          Requests:
            cpu:        1m
            memory:     15Mi
          Environment:  <none>
          Mounts:
            /etc/kube-rbac-proxy from secret-alertmanager-kube-rbac-proxy (rw)
            /etc/tls/private from secret-alertmanager-main-tls (rw)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hl77l (ro)
        kube-rbac-proxy-metric:
          Container ID:
          Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b5e1c69d005727e3245604cfca7a63e4f9bc6e15128c7489e41d5e967305089e
          Image ID:
          Port:          9097/TCP
          Host Port:     0/TCP
          Args:
            --secure-listen-address=0.0.0.0:9097
            --upstream=http://127.0.0.1:9093
            --config-file=/etc/kube-rbac-proxy/config.yaml
            --tls-cert-file=/etc/tls/private/tls.crt
            --tls-private-key-file=/etc/tls/private/tls.key
            --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
            --client-ca-file=/etc/tls/client/client-ca.crt
            --logtostderr=true
            --allow-paths=/metrics
            --tls-min-version=VersionTLS12
          State:          Waiting
            Reason:       ContainerCreating
          Ready:          False
          Restart Count:  0
          Requests:
            cpu:        1m
            memory:     15Mi
          Environment:  <none>
          Mounts:
            /etc/kube-rbac-proxy from secret-alertmanager-kube-rbac-proxy-metric (ro)
            /etc/tls/client from metrics-client-ca (ro)
            /etc/tls/private from secret-alertmanager-main-tls (ro)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hl77l (ro)
        prom-label-proxy:
          Container ID:
          Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:2550b2cbdf864515b1edacf43c25eb6b6f179713c1df34e51f6e9bba48d6430a
          Image ID:
          Port:          <none>
          Host Port:     <none>
          Args:
            --insecure-listen-address=127.0.0.1:9096
            --upstream=http://127.0.0.1:9093
            --label=namespace
            --error-on-replace
          State:          Waiting
            Reason:       ContainerCreating
          Ready:          False
          Restart Count:  0
          Requests:
            cpu:        1m
            memory:     20Mi
          Environment:  <none>
          Mounts:
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-hl77l (ro)
      Conditions:
        Type              Status
        Initialized       True
        Ready             False
        ContainersReady   False
        PodScheduled      True
      Volumes:
        config-volume:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  alertmanager-main-generated
          Optional:    false
        tls-assets:
          Type:                Projected (a volume that contains injected data from multiple sources)
          SecretName:          alertmanager-main-tls-assets-0
          SecretOptionalName:  <nil>
        secret-alertmanager-main-tls:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  alertmanager-main-tls
          Optional:    false
        secret-alertmanager-main-proxy:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  alertmanager-main-proxy
          Optional:    false
        secret-alertmanager-kube-rbac-proxy:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  alertmanager-kube-rbac-proxy
          Optional:    false
        secret-alertmanager-kube-rbac-proxy-metric:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  alertmanager-kube-rbac-proxy-metric
          Optional:    false
        alertmanager-main-db:
          Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
          Medium:
          SizeLimit:  <unset>
        metrics-client-ca:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      metrics-client-ca
          Optional:  false
        alertmanager-trusted-ca-bundle:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      alertmanager-trusted-ca-bundle-2rsonso43rc5p
          Optional:  true
        kube-api-access-hl77l:
          Type:                    Projected (a volume that contains injected data from multiple sources)
          TokenExpirationSeconds:  3607
          ConfigMapName:           kube-root-ca.crt
          ConfigMapOptional:       <nil>
          DownwardAPI:             true
          ConfigMapName:           openshift-service-ca.crt
          ConfigMapOptional:       <nil>
      QoS Class:                   Burstable
      Node-Selectors:              kubernetes.io/os=linux
      Tolerations:                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                                   node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                                   node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
      Events:
        Type     Reason                  Age                    From     Message
        ----     ------                  ----                   ----     -------
        Warning  FailedCreatePodSandBox  2m25s (x409 over 15h)  kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_alertmanager-main-0_openshift-monitoring_1c367a83-24e3-4249-861a-a107a6beaee2_0(dff5f302f774d060728261b3c86841ebdbd7ba11537ec9f4d90d57be17bdf44b): error adding pod openshift-monitoring_alertmanager-main-0 to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [openshift-monitoring/alertmanager-main-0/1c367a83-24e3-4249-861a-a107a6beaee2:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-monitoring/alertmanager-main-0 dff5f302f774d060728261b3c86841ebdbd7ba11537ec9f4d90d57be17bdf44b] [openshift-monitoring/alertmanager-main-0 dff5f302f774d060728261b3c86841ebdbd7ba11537ec9f4d90d57be17bdf44b] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded                                                                                                                                                                                                                                                                             
       oc --kubeconfig=/root/hv-vm/sno/manifests/sno01219/kubeconfig describe po -n openshift-monitoring prometheus-k8s-0
      Name:                 prometheus-k8s-0
      Namespace:            openshift-monitoring
      Priority:             2000000000
      Priority Class Name:  system-cluster-critical
      Node:                 sno01219/fc00:1001::8aa
      Start Time:           Mon, 15 Aug 2022 23:53:39 +0000
      Labels:               app.kubernetes.io/component=prometheus
                            app.kubernetes.io/instance=k8s
                            app.kubernetes.io/managed-by=prometheus-operator
                            app.kubernetes.io/name=prometheus
                            app.kubernetes.io/part-of=openshift-monitoring
                            app.kubernetes.io/version=2.36.2
                            controller-revision-hash=prometheus-k8s-546b544f8b
                            operator.prometheus.io/name=k8s
                            operator.prometheus.io/shard=0
                            prometheus=k8s
                            statefulset.kubernetes.io/pod-name=prometheus-k8s-0
      Annotations:          kubectl.kubernetes.io/default-container: prometheus
                            openshift.io/scc: nonroot
      Status:               Pending
      IP:
      IPs:                  <none>
      Controlled By:        StatefulSet/prometheus-k8s
      Init Containers:
        init-config-reloader:
          Container ID:
          Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:209e20410ec2d3d7a502f568d2b7fe1cd1beadcb36fff2d1e6f59d77be3200e3
          Image ID:
          Port:          8080/TCP
          Host Port:     0/TCP
          Command:
            /bin/prometheus-config-reloader
          Args:
            --watch-interval=0
            --listen-address=:8080
            --config-file=/etc/prometheus/config/prometheus.yaml.gz
            --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
            --watched-dir=/etc/prometheus/rules/prometheus-k8s-rulefiles-0
          State:          Waiting
            Reason:       PodInitializing
          Ready:          False
          Restart Count:  0
          Requests:
            cpu:     1m
            memory:  10Mi
          Environment:
            POD_NAME:  prometheus-k8s-0 (v1:metadata.name)
            SHARD:     0
          Mounts:
            /etc/prometheus/config from config (rw)
            /etc/prometheus/config_out from config-out (rw)
            /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-85zlc (ro)
      Containers:
        prometheus:
          Container ID:
          Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c7df53b796e81ba8301ba74d02317226329bd5752fd31c1b44d028e4832f21c3
          Image ID:
          Port:          <none>
          Host Port:     <none>
          Args:
            --web.console.templates=/etc/prometheus/consoles
            --web.console.libraries=/etc/prometheus/console_libraries
            --storage.tsdb.retention.time=15d
            --config.file=/etc/prometheus/config_out/prometheus.env.yaml
            --storage.tsdb.path=/prometheus
            --web.enable-lifecycle
            --web.external-url=https:/console-openshift-console.apps.sno01219.rdu2.scalelab.redhat.com/monitoring
            --web.route-prefix=/
            --web.listen-address=127.0.0.1:9090
            --web.config.file=/etc/prometheus/web_config/web-config.yaml
          State:          Waiting
            Reason:       PodInitializing
          Ready:          False
          Restart Count:  0
          Requests:
            cpu:        70m
            memory:     1Gi
          Liveness:     exec [sh -c if [ -x "$(command -v curl)" ]; then exec curl --fail http://localhost:9090/-/healthy; elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/healthy; else exit 1; fi] delay=0s timeout=3s period=5s #success=1 #failure=6
          Readiness:    exec [sh -c if [ -x "$(command -v curl)" ]; then exec curl --fail http://localhost:9090/-/ready; elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/ready; else exit 1; fi] delay=0s timeout=3s period=5s #success=1 #failure=3
          Startup:      exec [sh -c if [ -x "$(command -v curl)" ]; then exec curl --fail http://localhost:9090/-/ready; elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/ready; else exit 1; fi] delay=0s timeout=3s period=15s #success=1 #failure=60
          Environment:  <none>
          Mounts:
            /etc/pki/ca-trust/extracted/pem/ from prometheus-trusted-ca-bundle (ro)
            /etc/prometheus/certs from tls-assets (ro)
            /etc/prometheus/config_out from config-out (ro)
            /etc/prometheus/configmaps/kubelet-serving-ca-bundle from configmap-kubelet-serving-ca-bundle (ro)
            /etc/prometheus/configmaps/metrics-client-ca from configmap-metrics-client-ca (ro)
            /etc/prometheus/configmaps/serving-certs-ca-bundle from configmap-serving-certs-ca-bundle (ro)
            /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw)
            /etc/prometheus/secrets/kube-etcd-client-certs from secret-kube-etcd-client-certs (ro)
            /etc/prometheus/secrets/kube-rbac-proxy from secret-kube-rbac-proxy (ro)
            /etc/prometheus/secrets/metrics-client-certs from secret-metrics-client-certs (ro)
            /etc/prometheus/secrets/prometheus-k8s-proxy from secret-prometheus-k8s-proxy (ro)
            /etc/prometheus/secrets/prometheus-k8s-thanos-sidecar-tls from secret-prometheus-k8s-thanos-sidecar-tls (ro)
            /etc/prometheus/secrets/prometheus-k8s-tls from secret-prometheus-k8s-tls (ro)
            /etc/prometheus/web_config/web-config.yaml from web-config (ro,path="web-config.yaml")
            /prometheus from prometheus-k8s-db (rw)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-85zlc (ro)
        config-reloader:
          Container ID:
          Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:209e20410ec2d3d7a502f568d2b7fe1cd1beadcb36fff2d1e6f59d77be3200e3
          Image ID:
          Port:          <none>
          Host Port:     <none>
          Command:
            /bin/prometheus-config-reloader
          Args:
            --listen-address=localhost:8080
            --reload-url=http://localhost:9090/-/reload
            --config-file=/etc/prometheus/config/prometheus.yaml.gz
            --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
            --watched-dir=/etc/prometheus/rules/prometheus-k8s-rulefiles-0
          State:          Waiting
            Reason:       PodInitializing
          Ready:          False
          Restart Count:  0
          Requests:
            cpu:     1m
            memory:  10Mi
          Environment:
            POD_NAME:  prometheus-k8s-0 (v1:metadata.name)
            SHARD:     0
          Mounts:
            /etc/prometheus/config from config (rw)
            /etc/prometheus/config_out from config-out (rw)
            /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-85zlc (ro)
        thanos-sidecar:
          Container ID:
          Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:36fc214537c763b3a3f0a9dc7a1bd4378a80428c31b2629df8786a9b09155e6d
          Image ID:
          Ports:         10902/TCP, 10901/TCP
          Host Ports:    0/TCP, 0/TCP
          Args:
            sidecar
            --prometheus.url=http://localhost:9090/
            --tsdb.path=/prometheus
            --http-address=127.0.0.1:10902
            --grpc-server-tls-cert=/etc/tls/grpc/server.crt
            --grpc-server-tls-key=/etc/tls/grpc/server.key
            --grpc-server-tls-client-ca=/etc/tls/grpc/ca.crt
          State:          Waiting
            Reason:       PodInitializing
          Ready:          False
          Restart Count:  0
          Requests:
            cpu:        1m
            memory:     25Mi
          Environment:  <none>
          Mounts:
            /etc/tls/grpc from secret-grpc-tls (rw)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-85zlc (ro)
        prometheus-proxy:
          Container ID:
          Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:140f8947593d92e1517e50a201e83bdef8eb965b552a21d3caf346a250d0cf6e
          Image ID:
          Port:          9091/TCP
          Host Port:     0/TCP
          Args:
            -provider=openshift
            -https-address=:9091
            -http-address=
            -email-domain=*
            -upstream=http://localhost:9090
            -openshift-service-account=prometheus-k8s
            -openshift-sar={"resource": "namespaces", "verb": "get"}
            -openshift-delegate-urls={"/": {"resource": "namespaces", "verb": "get"}}
            -tls-cert=/etc/tls/private/tls.crt
            -tls-key=/etc/tls/private/tls.key
            -client-secret-file=/var/run/secrets/kubernetes.io/serviceaccount/token
            -cookie-secret-file=/etc/proxy/secrets/session_secret
            -openshift-ca=/etc/pki/tls/cert.pem
            -openshift-ca=/var/run/secrets/kubernetes.io/serviceaccount/ca.crt
          State:          Waiting
            Reason:       PodInitializing
          Ready:          False
          Restart Count:  0
          Requests:
            cpu:     1m
            memory:  20Mi
          Environment:
            HTTP_PROXY:
            HTTPS_PROXY:
            NO_PROXY:
          Mounts:
            /etc/pki/ca-trust/extracted/pem/ from prometheus-trusted-ca-bundle (ro)
            /etc/proxy/secrets from secret-prometheus-k8s-proxy (rw)
            /etc/tls/private from secret-prometheus-k8s-tls (rw)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-85zlc (ro)
        kube-rbac-proxy:
          Container ID:
          Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b5e1c69d005727e3245604cfca7a63e4f9bc6e15128c7489e41d5e967305089e
          Image ID:
          Port:          9092/TCP
          Host Port:     0/TCP
          Args:
            --secure-listen-address=0.0.0.0:9092
            --upstream=http://127.0.0.1:9090
            --allow-paths=/metrics
            --config-file=/etc/kube-rbac-proxy/config.yaml
            --tls-cert-file=/etc/tls/private/tls.crt
            --tls-private-key-file=/etc/tls/private/tls.key
            --client-ca-file=/etc/tls/client/client-ca.crt
            --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
            --logtostderr=true
            --tls-min-version=VersionTLS12
          State:          Waiting
            Reason:       PodInitializing
          Ready:          False
          Restart Count:  0
          Requests:
            cpu:        1m
            memory:     15Mi
          Environment:  <none>
          Mounts:
            /etc/kube-rbac-proxy from secret-kube-rbac-proxy (rw)
            /etc/tls/client from configmap-metrics-client-ca (ro)
            /etc/tls/private from secret-prometheus-k8s-tls (rw)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-85zlc (ro)
        kube-rbac-proxy-thanos:
          Container ID:
          Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:b5e1c69d005727e3245604cfca7a63e4f9bc6e15128c7489e41d5e967305089e
          Image ID:
          Port:          10902/TCP
          Host Port:     0/TCP
          Args:
            --secure-listen-address=[$(POD_IP)]:10902
            --upstream=http://127.0.0.1:10902
            --tls-cert-file=/etc/tls/private/tls.crt
            --tls-private-key-file=/etc/tls/private/tls.key
            --client-ca-file=/etc/tls/client/client-ca.crt
            --config-file=/etc/kube-rbac-proxy/config.yaml
            --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
            --allow-paths=/metrics
            --logtostderr=true
            --tls-min-version=VersionTLS12
            --client-ca-file=/etc/tls/client/client-ca.crt
          State:          Waiting
            Reason:       PodInitializing
          Ready:          False
          Restart Count:  0
          Requests:
            cpu:     1m
            memory:  10Mi
          Environment:
            POD_IP:   (v1:status.podIP)
          Mounts:
            /etc/kube-rbac-proxy from secret-kube-rbac-proxy (rw)
            /etc/tls/client from metrics-client-ca (ro)
            /etc/tls/private from secret-prometheus-k8s-thanos-sidecar-tls (rw)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-85zlc (ro)
      Conditions:
        Type              Status
        Initialized       False
        Ready             False
        ContainersReady   False
        PodScheduled      True
      Volumes:
        config:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  prometheus-k8s
          Optional:    false
        tls-assets:
          Type:                Projected (a volume that contains injected data from multiple sources)
          SecretName:          prometheus-k8s-tls-assets-0
          SecretOptionalName:  <nil>
        config-out:
          Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
          Medium:
          SizeLimit:  <unset>
        prometheus-k8s-rulefiles-0:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      prometheus-k8s-rulefiles-0
          Optional:  false
        web-config:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  prometheus-k8s-web-config
          Optional:    false
        secret-kube-etcd-client-certs:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  kube-etcd-client-certs
          Optional:    false
        secret-prometheus-k8s-tls:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  prometheus-k8s-tls
          Optional:    false
        secret-prometheus-k8s-proxy:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  prometheus-k8s-proxy
          Optional:    false
        secret-prometheus-k8s-thanos-sidecar-tls:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  prometheus-k8s-thanos-sidecar-tls
          Optional:    false
        secret-kube-rbac-proxy:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  kube-rbac-proxy
          Optional:    false
        secret-metrics-client-certs:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  metrics-client-certs
          Optional:    false
        configmap-serving-certs-ca-bundle:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      serving-certs-ca-bundle
          Optional:  false
        configmap-kubelet-serving-ca-bundle:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      kubelet-serving-ca-bundle
          Optional:  false
        configmap-metrics-client-ca:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      metrics-client-ca
          Optional:  false
        prometheus-k8s-db:
          Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
          Medium:
          SizeLimit:  <unset>
        metrics-client-ca:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      metrics-client-ca
          Optional:  false
        secret-grpc-tls:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  prometheus-k8s-grpc-tls-crdkohb1gb92n
          Optional:    false
        prometheus-trusted-ca-bundle:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      prometheus-trusted-ca-bundle-2rsonso43rc5p
          Optional:  true
        kube-api-access-85zlc:
          Type:                    Projected (a volume that contains injected data from multiple sources)
          TokenExpirationSeconds:  3607
          ConfigMapName:           kube-root-ca.crt
          ConfigMapOptional:       <nil>
          DownwardAPI:             true
          ConfigMapName:           openshift-service-ca.crt
          ConfigMapOptional:       <nil>
      QoS Class:                   Burstable
      Node-Selectors:              kubernetes.io/os=linux
      Tolerations:                 node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                                   node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                                   node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
      Events:
        Type     Reason                  Age                    From     Message
        ----     ------                  ----                   ----     -------
        Warning  FailedCreatePodSandBox  4m19s (x409 over 15h)  kubelet  (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_prometheus-k8s-0_openshift-monitoring_debda4d2-6914-4b36-92e0-78f68d539ab3_0(86af91d4e64ab0fbad95352b029762e9856ff24005445b458bccb22e0ee9b655): error adding pod openshift-monitoring_prometheus-k8s-0 to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [openshift-monitoring/prometheus-k8s-0/debda4d2-6914-4b36-92e0-78f68d539ab3:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[openshift-monitoring/prometheus-k8s-0 86af91d4e64ab0fbad95352b029762e9856ff24005445b458bccb22e0ee9b655] [openshift-monitoring/prometheus-k8s-0 86af91d4e64ab0fbad95352b029762e9856ff24005445b458bccb22e0ee9b655] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
      

      Both pods in error state seem to be waiting on this issue "failed to get pod annotation: timed out waiting for annotations: context deadline exceeded"

      Attachments

        Issue Links

          Activity

            People

              pepalani@redhat.com Periyasamy Palanichamy
              akrzos@redhat.com Alex Krzos
              Alex Krzos Alex Krzos
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: