Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-41180

AttachVolume.Attach failed for volume : volume attachment is being deleted error when creating a Prometheus Pod

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Not a Bug
    • Icon: Critical Critical
    • None
    • 4.16.z
    • Storage / Kubernetes
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • None
    • None
    • None
    • MON Sprint 265
    • 1
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

          The prometheus-k8s-0 pod failed to created due to volume deleting with unknown reason, the pvc and pv still exit, but the volume deleting with unknown reason
      
          oc -n openshift-monitoring describe pod prometheus-k8s-0
      Name:                 prometheus-k8s-0
      Namespace:            openshift-monitoring
      Priority:             2000000000
      Priority Class Name:  system-cluster-critical
      Service Account:      prometheus-k8s
      Node:                 ip-10-0-4-67.us-east-2.compute.internal/10.0.4.67
      Start Time:           Thu, 05 Sep 2024 08:55:24 +0000
      Labels:               app.kubernetes.io/component=prometheus
                            app.kubernetes.io/instance=k8s
                            app.kubernetes.io/managed-by=prometheus-operator
                            app.kubernetes.io/name=prometheus
                            app.kubernetes.io/part-of=openshift-monitoring
                            app.kubernetes.io/version=2.52.0
                            apps.kubernetes.io/pod-index=0
                            controller-revision-hash=prometheus-k8s-856d7759cc
                            operator.prometheus.io/name=k8s
                            operator.prometheus.io/shard=0
                            prometheus=k8s
                            statefulset.kubernetes.io/pod-name=prometheus-k8s-0
      Annotations:          k8s.ovn.org/pod-networks:
                              {"default":{"ip_addresses":["10.130.2.8/23"],"mac_address":"0a:58:0a:82:02:08","gateway_ips":["10.130.2.1"],"routes":[{"dest":"10.128.0.0/...
                            kubectl.kubernetes.io/default-container: prometheus
                            openshift.io/required-scc: nonroot
                            openshift.io/scc: nonroot
      Status:               Pending
      IP:
      IPs:                  <none>
      Controlled By:        StatefulSet/prometheus-k8s
      Init Containers:
        init-config-reloader:
          Container ID:
          Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:356d4ce991042a2affc27988c328a2ce686a52132c3ca1b630bce6b7965e8f90
          Image ID:
          Port:          8080/TCP
          Host Port:     0/TCP
          Command:
            /bin/prometheus-config-reloader
          Args:
            --watch-interval=0
            --listen-address=:8080
            --config-file=/etc/prometheus/config/prometheus.yaml.gz
            --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
            --watched-dir=/etc/prometheus/rules/prometheus-k8s-rulefiles-0
          State:          Waiting
            Reason:       PodInitializing
          Ready:          False
          Restart Count:  0
          Requests:
            cpu:     1m
            memory:  10Mi
          Environment:
            POD_NAME:  prometheus-k8s-0 (v1:metadata.name)
            SHARD:     0
          Mounts:
            /etc/prometheus/config from config (rw)
            /etc/prometheus/config_out from config-out (rw)
            /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw)
            /etc/prometheus/web_config/web-config.yaml from web-config (ro,path="web-config.yaml")
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fqmjd (ro)
      Containers:
        prometheus:
          Container ID:
          Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:87b350932e17e0b93bf337c1e6923b39b92ba21df119a9de8c3c8bd603d00e44
          Image ID:
          Port:          <none>
          Host Port:     <none>
          Args:
            --web.console.templates=/etc/prometheus/consoles
            --web.console.libraries=/etc/prometheus/console_libraries
            --config.file=/etc/prometheus/config_out/prometheus.env.yaml
            --web.enable-lifecycle
            --web.external-url=https://console-openshift-console.apps.liqcui-sdn2ovn.qe.devcluster.openshift.com/monitoring
            --web.route-prefix=/
            --web.listen-address=127.0.0.1:9090
            --storage.tsdb.retention.time=15d
            --storage.tsdb.path=/prometheus
            --web.config.file=/etc/prometheus/web_config/web-config.yaml
            --scrape.timestamp-tolerance=15ms
          State:          Waiting
            Reason:       PodInitializing
          Ready:          False
          Restart Count:  0
          Requests:
            cpu:        70m
            memory:     1Gi
          Liveness:     exec [sh -c if [ -x "$(command -v curl)" ]; then exec curl --fail http://localhost:9090/-/healthy; elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/healthy; else exit 1; fi] delay=0s timeout=3s period=5s #success=1 #failure=6
          Readiness:    exec [sh -c if [ -x "$(command -v curl)" ]; then exec curl --fail http://localhost:9090/-/ready; elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/ready; else exit 1; fi] delay=0s timeout=3s period=5s #success=1 #failure=3
          Startup:      exec [sh -c if [ -x "$(command -v curl)" ]; then exec curl --fail http://localhost:9090/-/ready; elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/ready; else exit 1; fi] delay=0s timeout=3s period=60s #success=1 #failure=60
          Environment:  <none>
          Mounts:
            /etc/pki/ca-trust/extracted/pem/ from prometheus-trusted-ca-bundle (rw)
            /etc/prometheus/certs from tls-assets (ro)
            /etc/prometheus/config_out from config-out (ro)
            /etc/prometheus/configmaps/kubelet-serving-ca-bundle from configmap-kubelet-serving-ca-bundle (ro)
            /etc/prometheus/configmaps/metrics-client-ca from configmap-metrics-client-ca (ro)
            /etc/prometheus/configmaps/serving-certs-ca-bundle from configmap-serving-certs-ca-bundle (ro)
            /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw)
            /etc/prometheus/secrets/kube-rbac-proxy from secret-kube-rbac-proxy (ro)
            /etc/prometheus/secrets/metrics-client-certs from secret-metrics-client-certs (ro)
            /etc/prometheus/secrets/prometheus-k8s-kube-rbac-proxy-web from secret-prometheus-k8s-kube-rbac-proxy-web (ro)
            /etc/prometheus/secrets/prometheus-k8s-thanos-sidecar-tls from secret-prometheus-k8s-thanos-sidecar-tls (ro)
            /etc/prometheus/secrets/prometheus-k8s-tls from secret-prometheus-k8s-tls (ro)
            /etc/prometheus/web_config/web-config.yaml from web-config (ro,path="web-config.yaml")
            /prometheus from prometheus-k8s-db (rw,path="prometheus-db")
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fqmjd (ro)
        config-reloader:
          Container ID:
          Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:356d4ce991042a2affc27988c328a2ce686a52132c3ca1b630bce6b7965e8f90
          Image ID:
          Port:          <none>
          Host Port:     <none>
          Command:
            /bin/prometheus-config-reloader
          Args:
            --listen-address=localhost:8080
            --web-config-file=/etc/prometheus/web_config/web-config.yaml
            --reload-url=http://localhost:9090/-/reload
            --config-file=/etc/prometheus/config/prometheus.yaml.gz
            --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
            --watched-dir=/etc/prometheus/rules/prometheus-k8s-rulefiles-0
          State:          Waiting
            Reason:       PodInitializing
          Ready:          False
          Restart Count:  0
          Requests:
            cpu:     1m
            memory:  10Mi
          Environment:
            POD_NAME:  prometheus-k8s-0 (v1:metadata.name)
            SHARD:     0
          Mounts:
            /etc/prometheus/config from config (rw)
            /etc/prometheus/config_out from config-out (rw)
            /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw)
            /etc/prometheus/web_config/web-config.yaml from web-config (ro,path="web-config.yaml")
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fqmjd (ro)
        thanos-sidecar:
          Container ID:
          Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:993e54a6864d7fe7fa61d3faf5a98a4438dec0a447b0d1e837cc92ea1a0ce16e
          Image ID:
          Ports:         10902/TCP, 10901/TCP
          Host Ports:    0/TCP, 0/TCP
          Args:
            sidecar
            --prometheus.url=http://localhost:9090/
            --tsdb.path=/prometheus
            --http-address=127.0.0.1:10902
            --grpc-server-tls-cert=/etc/tls/grpc/server.crt
            --grpc-server-tls-key=/etc/tls/grpc/server.key
            --grpc-server-tls-client-ca=/etc/tls/grpc/ca.crt
          State:          Waiting
            Reason:       PodInitializing
          Ready:          False
          Restart Count:  0
          Requests:
            cpu:        1m
            memory:     25Mi
          Environment:  <none>
          Mounts:
            /etc/tls/grpc from secret-grpc-tls (rw)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fqmjd (ro)
        kube-rbac-proxy-web:
          Container ID:
          Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:70458d010bd9f4e9c43b6452fe79e529af926deab2714e10ba1366789ec15d9f
          Image ID:
          Port:          9091/TCP
          Host Port:     0/TCP
          Args:
            --secure-listen-address=0.0.0.0:9091
            --upstream=http://127.0.0.1:9090
            --config-file=/etc/kube-rbac-proxy/config.yaml
            --tls-cert-file=/etc/tls/private/tls.crt
            --tls-private-key-file=/etc/tls/private/tls.key
            --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
            --ignore-paths=/-/healthy,/-/ready
            --tls-min-version=VersionTLS12
          State:          Waiting
            Reason:       PodInitializing
          Ready:          False
          Restart Count:  0
          Requests:
            cpu:        1m
            memory:     15Mi
          Environment:  <none>
          Mounts:
            /etc/kube-rbac-proxy from secret-prometheus-k8s-kube-rbac-proxy-web (rw)
            /etc/tls/private from secret-prometheus-k8s-tls (rw)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fqmjd (ro)
        kube-rbac-proxy:
          Container ID:
          Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:70458d010bd9f4e9c43b6452fe79e529af926deab2714e10ba1366789ec15d9f
          Image ID:
          Port:          9092/TCP
          Host Port:     0/TCP
          Args:
            --secure-listen-address=0.0.0.0:9092
            --upstream=http://127.0.0.1:9090
            --allow-paths=/metrics,/federate
            --config-file=/etc/kube-rbac-proxy/config.yaml
            --tls-cert-file=/etc/tls/private/tls.crt
            --tls-private-key-file=/etc/tls/private/tls.key
            --client-ca-file=/etc/tls/client/client-ca.crt
            --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
            --tls-min-version=VersionTLS12
          State:          Waiting
            Reason:       PodInitializing
          Ready:          False
          Restart Count:  0
          Requests:
            cpu:        1m
            memory:     15Mi
          Environment:  <none>
          Mounts:
            /etc/kube-rbac-proxy from secret-kube-rbac-proxy (rw)
            /etc/tls/client from configmap-metrics-client-ca (ro)
            /etc/tls/private from secret-prometheus-k8s-tls (rw)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fqmjd (ro)
        kube-rbac-proxy-thanos:
          Container ID:
          Image:         quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:70458d010bd9f4e9c43b6452fe79e529af926deab2714e10ba1366789ec15d9f
          Image ID:
          Port:          10903/TCP
          Host Port:     0/TCP
          Args:
            --secure-listen-address=[$(POD_IP)]:10903
            --upstream=http://127.0.0.1:10902
            --tls-cert-file=/etc/tls/private/tls.crt
            --tls-private-key-file=/etc/tls/private/tls.key
            --client-ca-file=/etc/tls/client/client-ca.crt
            --config-file=/etc/kube-rbac-proxy/config.yaml
            --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
            --allow-paths=/metrics
            --tls-min-version=VersionTLS12
          State:          Waiting
            Reason:       PodInitializing
          Ready:          False
          Restart Count:  0
          Requests:
            cpu:     1m
            memory:  10Mi
          Environment:
            POD_IP:   (v1:status.podIP)
          Mounts:
            /etc/kube-rbac-proxy from secret-kube-rbac-proxy (ro)
            /etc/tls/client from configmap-metrics-client-ca (ro)
            /etc/tls/private from secret-prometheus-k8s-thanos-sidecar-tls (ro)
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fqmjd (ro)
      Conditions:
        Type                        Status
        PodReadyToStartContainers   False
        Initialized                 False
        Ready                       False
        ContainersReady             False
        PodScheduled                True
      Volumes:
        prometheus-k8s-db:
          Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
          ClaimName:  prometheus-k8s-db-prometheus-k8s-0
          ReadOnly:   false
        config:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  prometheus-k8s
          Optional:    false
        tls-assets:
          Type:                Projected (a volume that contains injected data from multiple sources)
          SecretName:          prometheus-k8s-tls-assets-0
          SecretOptionalName:  <nil>
        config-out:
          Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
          Medium:     Memory
          SizeLimit:  <unset>
        secret-prometheus-k8s-tls:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  prometheus-k8s-tls
          Optional:    false
        secret-prometheus-k8s-thanos-sidecar-tls:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  prometheus-k8s-thanos-sidecar-tls
          Optional:    false
        secret-kube-rbac-proxy:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  kube-rbac-proxy
          Optional:    false
        secret-prometheus-k8s-kube-rbac-proxy-web:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  prometheus-k8s-kube-rbac-proxy-web
          Optional:    false
        secret-metrics-client-certs:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  metrics-client-certs
          Optional:    false
        configmap-serving-certs-ca-bundle:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      serving-certs-ca-bundle
          Optional:  false
        configmap-kubelet-serving-ca-bundle:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      kubelet-serving-ca-bundle
          Optional:  false
        configmap-metrics-client-ca:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      metrics-client-ca
          Optional:  false
        prometheus-k8s-rulefiles-0:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      prometheus-k8s-rulefiles-0
          Optional:  false
        web-config:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  prometheus-k8s-web-config
          Optional:    false
        prometheus-trusted-ca-bundle:
          Type:      ConfigMap (a volume populated by a ConfigMap)
          Name:      prometheus-trusted-ca-bundle
          Optional:  false
        secret-grpc-tls:
          Type:        Secret (a volume populated by a Secret)
          SecretName:  prometheus-k8s-grpc-tls-9sg4kpkjnt4o0
          Optional:    false
        kube-api-access-fqmjd:
          Type:                    Projected (a volume that contains injected data from multiple sources)
          TokenExpirationSeconds:  3607
          ConfigMapName:           kube-root-ca.crt
          ConfigMapOptional:       <nil>
          DownwardAPI:             true
          ConfigMapName:           openshift-service-ca.crt
          ConfigMapOptional:       <nil>
      QoS Class:                   Burstable
      Node-Selectors:              node-role.kubernetes.io/infra=
      Tolerations:                 node-role.kubernetes.io/infra=reserved:NoSchedule
                                   node-role.kubernetes.io/infra=reserved:NoExecute
                                   node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                                   node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                                   node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
      Events:
        Type     Reason              Age                    From                     Message
        ----     ------              ----                   ----                     -------
        Warning  FailedScheduling    149m                   default-scheduler        0/31 nodes are available: 2 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 2 node(s) had volume node affinity conflict, 22 node(s) didn't match Pod's node affinity/selector, 5 node(s) were unschedulable. preemption: 0/31 nodes are available: 31 Preemption is not helpful for scheduling.
        Warning  FailedScheduling    144m                   default-scheduler        0/31 nodes are available: 2 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 2 node(s) had volume node affinity conflict, 21 node(s) didn't match Pod's node affinity/selector, 6 node(s) were unschedulable. preemption: 0/31 nodes are available: 31 Preemption is not helpful for scheduling.
        Warning  FailedScheduling    138m                   default-scheduler        0/31 nodes are available: 2 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 2 node(s) had volume node affinity conflict, 21 node(s) didn't match Pod's node affinity/selector, 6 node(s) were unschedulable. preemption: 0/31 nodes are available: 31 Preemption is not helpful for scheduling.
        Normal   Scheduled           133m                   default-scheduler        Successfully assigned openshift-monitoring/prometheus-k8s-0 to ip-10-0-4-67.us-east-2.compute.internal
        Warning  FailedAttachVolume  111m (x19 over 133m)   attachdetach-controller  AttachVolume.Attach failed for volume "pvc-622bca2b-5053-4d05-ac8c-95c820ace8f3" : volume attachment is being deleted
        Warning  FailedAttachVolume  3m58s (x60 over 110m)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-622bca2b-5053-4d05-ac8c-95c820ace8f3" : volume attachment is being deleted
      
      [ocpadmin@ip-10-0-0-179 ~]$ oc -n openshift-monitoring get pvc
      NAME                                       STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   VOLUMEATTRIBUTESCLASS   AGE
      alertmanager-main-db-alertmanager-main-0   Bound    pvc-cc54ebf1-3460-42ad-9b5f-91e350adfb83   2Gi        RWO            gp3-csi        <unset>                 5h48m
      alertmanager-main-db-alertmanager-main-1   Bound    pvc-6a8e3453-bd96-4c80-95c7-d1202c1f49e3   2Gi        RWO            gp3-csi        <unset>                 5h48m
      prometheus-k8s-db-prometheus-k8s-0         Bound    pvc-622bca2b-5053-4d05-ac8c-95c820ace8f3   100Gi      RWO            gp3-csi        <unset>                 5h48m
      prometheus-k8s-db-prometheus-k8s-1         Bound    pvc-5185de21-13da-4b6e-8dff-b9a6d39db1c0   100Gi      RWO            gp3-csi        <unset>                 5h48m
      [ocpadmin@ip-10-0-0-179 ~]$ oc -n openshift-monitoring get pv
      NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS   CLAIM                                                           STORAGECLASS   VOLUMEATTRIBUTESCLASS   REASON   AGE
      pvc-5185de21-13da-4b6e-8dff-b9a6d39db1c0   100Gi      RWO            Delete           Bound    openshift-monitoring/prometheus-k8s-db-prometheus-k8s-1         gp3-csi        <unset>                          5h48m
      pvc-622bca2b-5053-4d05-ac8c-95c820ace8f3   100Gi      RWO            Delete           Bound    openshift-monitoring/prometheus-k8s-db-prometheus-k8s-0         gp3-csi        <unset>                          5h48m
      pvc-6a8e3453-bd96-4c80-95c7-d1202c1f49e3   2Gi        RWO            Delete           Bound    openshift-monitoring/alertmanager-main-db-alertmanager-main-1   gp3-csi        <unset>                          5h48m
      pvc-cc54ebf1-3460-42ad-9b5f-91e350adfb83   2Gi        RWO            Delete           Bound    openshift-monitoring/alertmanager-main-db-alertmanager-main-0   gp3-csi        <unset>                          5h48m
      [ocpadmin@ip-10-0-0-179 ~]${code}
      Version-Release number of selected component (if applicable):
      {code:none}
          

      How reproducible:

          

      Steps to Reproduce:

          1. Creating OCP with SDN on AWS, the instance type is c5n.metal, scale out to 24 worker node and 3 infra node and 1 workload
          2. Migration SDN to OVN using oc patch Network.config.openshift.io cluster --type='merge' --patch '{"metadata":{"annotations":{"network.openshift.io/network-type-migration":""}},"spec":{"networkType":"OVNKubernetes"}}'     
      
          

      Actual results:

          The pod prometheus-k8s-0 failed to startup due to FailedAttachVolume  3m58s (x60 over 110m)  attachdetach-controller  AttachVolume.Attach failed for volume "pvc-622bca2b-5053-4d05-ac8c-95c820ace8f3" : volume attachment is being deleted, it will block infra node drain node, and block migrate to ovn from sdn

      Expected results:

          The pod prometheus-k8s-0 startup properly

      Additional info:

          

              hekumar@redhat.com Hemant Kumar
              rhn-support-liqcui Liquan Cui
              None
              None
              Wei Duan Wei Duan
              None
              Votes:
              0 Vote for this issue
              Watchers:
              10 Start watching this issue

                Created:
                Updated:
                Resolved: