-
Bug
-
Resolution: Not a Bug
-
Critical
-
None
-
4.16.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
None
-
None
-
None
-
None
-
MON Sprint 265
-
1
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
The prometheus-k8s-0 pod failed to created due to volume deleting with unknown reason, the pvc and pv still exit, but the volume deleting with unknown reason oc -n openshift-monitoring describe pod prometheus-k8s-0 Name: prometheus-k8s-0 Namespace: openshift-monitoring Priority: 2000000000 Priority Class Name: system-cluster-critical Service Account: prometheus-k8s Node: ip-10-0-4-67.us-east-2.compute.internal/10.0.4.67 Start Time: Thu, 05 Sep 2024 08:55:24 +0000 Labels: app.kubernetes.io/component=prometheus app.kubernetes.io/instance=k8s app.kubernetes.io/managed-by=prometheus-operator app.kubernetes.io/name=prometheus app.kubernetes.io/part-of=openshift-monitoring app.kubernetes.io/version=2.52.0 apps.kubernetes.io/pod-index=0 controller-revision-hash=prometheus-k8s-856d7759cc operator.prometheus.io/name=k8s operator.prometheus.io/shard=0 prometheus=k8s statefulset.kubernetes.io/pod-name=prometheus-k8s-0 Annotations: k8s.ovn.org/pod-networks: {"default":{"ip_addresses":["10.130.2.8/23"],"mac_address":"0a:58:0a:82:02:08","gateway_ips":["10.130.2.1"],"routes":[{"dest":"10.128.0.0/... kubectl.kubernetes.io/default-container: prometheus openshift.io/required-scc: nonroot openshift.io/scc: nonroot Status: Pending IP: IPs: <none> Controlled By: StatefulSet/prometheus-k8s Init Containers: init-config-reloader: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:356d4ce991042a2affc27988c328a2ce686a52132c3ca1b630bce6b7965e8f90 Image ID: Port: 8080/TCP Host Port: 0/TCP Command: /bin/prometheus-config-reloader Args: --watch-interval=0 --listen-address=:8080 --config-file=/etc/prometheus/config/prometheus.yaml.gz --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml --watched-dir=/etc/prometheus/rules/prometheus-k8s-rulefiles-0 State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Requests: cpu: 1m memory: 10Mi Environment: POD_NAME: prometheus-k8s-0 (v1:metadata.name) SHARD: 0 Mounts: /etc/prometheus/config from config (rw) /etc/prometheus/config_out from config-out (rw) /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw) /etc/prometheus/web_config/web-config.yaml from web-config (ro,path="web-config.yaml") /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fqmjd (ro) Containers: prometheus: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:87b350932e17e0b93bf337c1e6923b39b92ba21df119a9de8c3c8bd603d00e44 Image ID: Port: <none> Host Port: <none> Args: --web.console.templates=/etc/prometheus/consoles --web.console.libraries=/etc/prometheus/console_libraries --config.file=/etc/prometheus/config_out/prometheus.env.yaml --web.enable-lifecycle --web.external-url=https://console-openshift-console.apps.liqcui-sdn2ovn.qe.devcluster.openshift.com/monitoring --web.route-prefix=/ --web.listen-address=127.0.0.1:9090 --storage.tsdb.retention.time=15d --storage.tsdb.path=/prometheus --web.config.file=/etc/prometheus/web_config/web-config.yaml --scrape.timestamp-tolerance=15ms State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Requests: cpu: 70m memory: 1Gi Liveness: exec [sh -c if [ -x "$(command -v curl)" ]; then exec curl --fail http://localhost:9090/-/healthy; elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/healthy; else exit 1; fi] delay=0s timeout=3s period=5s #success=1 #failure=6 Readiness: exec [sh -c if [ -x "$(command -v curl)" ]; then exec curl --fail http://localhost:9090/-/ready; elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/ready; else exit 1; fi] delay=0s timeout=3s period=5s #success=1 #failure=3 Startup: exec [sh -c if [ -x "$(command -v curl)" ]; then exec curl --fail http://localhost:9090/-/ready; elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/ready; else exit 1; fi] delay=0s timeout=3s period=60s #success=1 #failure=60 Environment: <none> Mounts: /etc/pki/ca-trust/extracted/pem/ from prometheus-trusted-ca-bundle (rw) /etc/prometheus/certs from tls-assets (ro) /etc/prometheus/config_out from config-out (ro) /etc/prometheus/configmaps/kubelet-serving-ca-bundle from configmap-kubelet-serving-ca-bundle (ro) /etc/prometheus/configmaps/metrics-client-ca from configmap-metrics-client-ca (ro) /etc/prometheus/configmaps/serving-certs-ca-bundle from configmap-serving-certs-ca-bundle (ro) /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw) /etc/prometheus/secrets/kube-rbac-proxy from secret-kube-rbac-proxy (ro) /etc/prometheus/secrets/metrics-client-certs from secret-metrics-client-certs (ro) /etc/prometheus/secrets/prometheus-k8s-kube-rbac-proxy-web from secret-prometheus-k8s-kube-rbac-proxy-web (ro) /etc/prometheus/secrets/prometheus-k8s-thanos-sidecar-tls from secret-prometheus-k8s-thanos-sidecar-tls (ro) /etc/prometheus/secrets/prometheus-k8s-tls from secret-prometheus-k8s-tls (ro) /etc/prometheus/web_config/web-config.yaml from web-config (ro,path="web-config.yaml") /prometheus from prometheus-k8s-db (rw,path="prometheus-db") /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fqmjd (ro) config-reloader: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:356d4ce991042a2affc27988c328a2ce686a52132c3ca1b630bce6b7965e8f90 Image ID: Port: <none> Host Port: <none> Command: /bin/prometheus-config-reloader Args: --listen-address=localhost:8080 --web-config-file=/etc/prometheus/web_config/web-config.yaml --reload-url=http://localhost:9090/-/reload --config-file=/etc/prometheus/config/prometheus.yaml.gz --config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml --watched-dir=/etc/prometheus/rules/prometheus-k8s-rulefiles-0 State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Requests: cpu: 1m memory: 10Mi Environment: POD_NAME: prometheus-k8s-0 (v1:metadata.name) SHARD: 0 Mounts: /etc/prometheus/config from config (rw) /etc/prometheus/config_out from config-out (rw) /etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw) /etc/prometheus/web_config/web-config.yaml from web-config (ro,path="web-config.yaml") /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fqmjd (ro) thanos-sidecar: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:993e54a6864d7fe7fa61d3faf5a98a4438dec0a447b0d1e837cc92ea1a0ce16e Image ID: Ports: 10902/TCP, 10901/TCP Host Ports: 0/TCP, 0/TCP Args: sidecar --prometheus.url=http://localhost:9090/ --tsdb.path=/prometheus --http-address=127.0.0.1:10902 --grpc-server-tls-cert=/etc/tls/grpc/server.crt --grpc-server-tls-key=/etc/tls/grpc/server.key --grpc-server-tls-client-ca=/etc/tls/grpc/ca.crt State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Requests: cpu: 1m memory: 25Mi Environment: <none> Mounts: /etc/tls/grpc from secret-grpc-tls (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fqmjd (ro) kube-rbac-proxy-web: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:70458d010bd9f4e9c43b6452fe79e529af926deab2714e10ba1366789ec15d9f Image ID: Port: 9091/TCP Host Port: 0/TCP Args: --secure-listen-address=0.0.0.0:9091 --upstream=http://127.0.0.1:9090 --config-file=/etc/kube-rbac-proxy/config.yaml --tls-cert-file=/etc/tls/private/tls.crt --tls-private-key-file=/etc/tls/private/tls.key --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 --ignore-paths=/-/healthy,/-/ready --tls-min-version=VersionTLS12 State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Requests: cpu: 1m memory: 15Mi Environment: <none> Mounts: /etc/kube-rbac-proxy from secret-prometheus-k8s-kube-rbac-proxy-web (rw) /etc/tls/private from secret-prometheus-k8s-tls (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fqmjd (ro) kube-rbac-proxy: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:70458d010bd9f4e9c43b6452fe79e529af926deab2714e10ba1366789ec15d9f Image ID: Port: 9092/TCP Host Port: 0/TCP Args: --secure-listen-address=0.0.0.0:9092 --upstream=http://127.0.0.1:9090 --allow-paths=/metrics,/federate --config-file=/etc/kube-rbac-proxy/config.yaml --tls-cert-file=/etc/tls/private/tls.crt --tls-private-key-file=/etc/tls/private/tls.key --client-ca-file=/etc/tls/client/client-ca.crt --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 --tls-min-version=VersionTLS12 State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Requests: cpu: 1m memory: 15Mi Environment: <none> Mounts: /etc/kube-rbac-proxy from secret-kube-rbac-proxy (rw) /etc/tls/client from configmap-metrics-client-ca (ro) /etc/tls/private from secret-prometheus-k8s-tls (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fqmjd (ro) kube-rbac-proxy-thanos: Container ID: Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:70458d010bd9f4e9c43b6452fe79e529af926deab2714e10ba1366789ec15d9f Image ID: Port: 10903/TCP Host Port: 0/TCP Args: --secure-listen-address=[$(POD_IP)]:10903 --upstream=http://127.0.0.1:10902 --tls-cert-file=/etc/tls/private/tls.crt --tls-private-key-file=/etc/tls/private/tls.key --client-ca-file=/etc/tls/client/client-ca.crt --config-file=/etc/kube-rbac-proxy/config.yaml --tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256 --allow-paths=/metrics --tls-min-version=VersionTLS12 State: Waiting Reason: PodInitializing Ready: False Restart Count: 0 Requests: cpu: 1m memory: 10Mi Environment: POD_IP: (v1:status.podIP) Mounts: /etc/kube-rbac-proxy from secret-kube-rbac-proxy (ro) /etc/tls/client from configmap-metrics-client-ca (ro) /etc/tls/private from secret-prometheus-k8s-thanos-sidecar-tls (ro) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fqmjd (ro) Conditions: Type Status PodReadyToStartContainers False Initialized False Ready False ContainersReady False PodScheduled True Volumes: prometheus-k8s-db: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: prometheus-k8s-db-prometheus-k8s-0 ReadOnly: false config: Type: Secret (a volume populated by a Secret) SecretName: prometheus-k8s Optional: false tls-assets: Type: Projected (a volume that contains injected data from multiple sources) SecretName: prometheus-k8s-tls-assets-0 SecretOptionalName: <nil> config-out: Type: EmptyDir (a temporary directory that shares a pod's lifetime) Medium: Memory SizeLimit: <unset> secret-prometheus-k8s-tls: Type: Secret (a volume populated by a Secret) SecretName: prometheus-k8s-tls Optional: false secret-prometheus-k8s-thanos-sidecar-tls: Type: Secret (a volume populated by a Secret) SecretName: prometheus-k8s-thanos-sidecar-tls Optional: false secret-kube-rbac-proxy: Type: Secret (a volume populated by a Secret) SecretName: kube-rbac-proxy Optional: false secret-prometheus-k8s-kube-rbac-proxy-web: Type: Secret (a volume populated by a Secret) SecretName: prometheus-k8s-kube-rbac-proxy-web Optional: false secret-metrics-client-certs: Type: Secret (a volume populated by a Secret) SecretName: metrics-client-certs Optional: false configmap-serving-certs-ca-bundle: Type: ConfigMap (a volume populated by a ConfigMap) Name: serving-certs-ca-bundle Optional: false configmap-kubelet-serving-ca-bundle: Type: ConfigMap (a volume populated by a ConfigMap) Name: kubelet-serving-ca-bundle Optional: false configmap-metrics-client-ca: Type: ConfigMap (a volume populated by a ConfigMap) Name: metrics-client-ca Optional: false prometheus-k8s-rulefiles-0: Type: ConfigMap (a volume populated by a ConfigMap) Name: prometheus-k8s-rulefiles-0 Optional: false web-config: Type: Secret (a volume populated by a Secret) SecretName: prometheus-k8s-web-config Optional: false prometheus-trusted-ca-bundle: Type: ConfigMap (a volume populated by a ConfigMap) Name: prometheus-trusted-ca-bundle Optional: false secret-grpc-tls: Type: Secret (a volume populated by a Secret) SecretName: prometheus-k8s-grpc-tls-9sg4kpkjnt4o0 Optional: false kube-api-access-fqmjd: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: Burstable Node-Selectors: node-role.kubernetes.io/infra= Tolerations: node-role.kubernetes.io/infra=reserved:NoSchedule node-role.kubernetes.io/infra=reserved:NoExecute node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning FailedScheduling 149m default-scheduler 0/31 nodes are available: 2 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 2 node(s) had volume node affinity conflict, 22 node(s) didn't match Pod's node affinity/selector, 5 node(s) were unschedulable. preemption: 0/31 nodes are available: 31 Preemption is not helpful for scheduling. Warning FailedScheduling 144m default-scheduler 0/31 nodes are available: 2 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 2 node(s) had volume node affinity conflict, 21 node(s) didn't match Pod's node affinity/selector, 6 node(s) were unschedulable. preemption: 0/31 nodes are available: 31 Preemption is not helpful for scheduling. Warning FailedScheduling 138m default-scheduler 0/31 nodes are available: 2 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 2 node(s) had volume node affinity conflict, 21 node(s) didn't match Pod's node affinity/selector, 6 node(s) were unschedulable. preemption: 0/31 nodes are available: 31 Preemption is not helpful for scheduling. Normal Scheduled 133m default-scheduler Successfully assigned openshift-monitoring/prometheus-k8s-0 to ip-10-0-4-67.us-east-2.compute.internal Warning FailedAttachVolume 111m (x19 over 133m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-622bca2b-5053-4d05-ac8c-95c820ace8f3" : volume attachment is being deleted Warning FailedAttachVolume 3m58s (x60 over 110m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-622bca2b-5053-4d05-ac8c-95c820ace8f3" : volume attachment is being deleted [ocpadmin@ip-10-0-0-179 ~]$ oc -n openshift-monitoring get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE alertmanager-main-db-alertmanager-main-0 Bound pvc-cc54ebf1-3460-42ad-9b5f-91e350adfb83 2Gi RWO gp3-csi <unset> 5h48m alertmanager-main-db-alertmanager-main-1 Bound pvc-6a8e3453-bd96-4c80-95c7-d1202c1f49e3 2Gi RWO gp3-csi <unset> 5h48m prometheus-k8s-db-prometheus-k8s-0 Bound pvc-622bca2b-5053-4d05-ac8c-95c820ace8f3 100Gi RWO gp3-csi <unset> 5h48m prometheus-k8s-db-prometheus-k8s-1 Bound pvc-5185de21-13da-4b6e-8dff-b9a6d39db1c0 100Gi RWO gp3-csi <unset> 5h48m [ocpadmin@ip-10-0-0-179 ~]$ oc -n openshift-monitoring get pv NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE pvc-5185de21-13da-4b6e-8dff-b9a6d39db1c0 100Gi RWO Delete Bound openshift-monitoring/prometheus-k8s-db-prometheus-k8s-1 gp3-csi <unset> 5h48m pvc-622bca2b-5053-4d05-ac8c-95c820ace8f3 100Gi RWO Delete Bound openshift-monitoring/prometheus-k8s-db-prometheus-k8s-0 gp3-csi <unset> 5h48m pvc-6a8e3453-bd96-4c80-95c7-d1202c1f49e3 2Gi RWO Delete Bound openshift-monitoring/alertmanager-main-db-alertmanager-main-1 gp3-csi <unset> 5h48m pvc-cc54ebf1-3460-42ad-9b5f-91e350adfb83 2Gi RWO Delete Bound openshift-monitoring/alertmanager-main-db-alertmanager-main-0 gp3-csi <unset> 5h48m [ocpadmin@ip-10-0-0-179 ~]${code} Version-Release number of selected component (if applicable): {code:none}
How reproducible:
Steps to Reproduce:
1. Creating OCP with SDN on AWS, the instance type is c5n.metal, scale out to 24 worker node and 3 infra node and 1 workload 2. Migration SDN to OVN using oc patch Network.config.openshift.io cluster --type='merge' --patch '{"metadata":{"annotations":{"network.openshift.io/network-type-migration":""}},"spec":{"networkType":"OVNKubernetes"}}'
Actual results:
The pod prometheus-k8s-0 failed to startup due to FailedAttachVolume 3m58s (x60 over 110m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-622bca2b-5053-4d05-ac8c-95c820ace8f3" : volume attachment is being deleted, it will block infra node drain node, and block migrate to ovn from sdn
Expected results:
The pod prometheus-k8s-0 startup properly
Additional info:
- is duplicated by
-
OCPBUGS-55472 Stray Volume Attachment prevents pod starts
-
- Closed
-