-
Bug
-
Resolution: Not a Bug
-
Critical
-
None
-
4.16.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
None
-
None
-
None
-
None
-
MON Sprint 265
-
1
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
The prometheus-k8s-0 pod failed to created due to volume deleting with unknown reason, the pvc and pv still exit, but the volume deleting with unknown reason
oc -n openshift-monitoring describe pod prometheus-k8s-0
Name: prometheus-k8s-0
Namespace: openshift-monitoring
Priority: 2000000000
Priority Class Name: system-cluster-critical
Service Account: prometheus-k8s
Node: ip-10-0-4-67.us-east-2.compute.internal/10.0.4.67
Start Time: Thu, 05 Sep 2024 08:55:24 +0000
Labels: app.kubernetes.io/component=prometheus
app.kubernetes.io/instance=k8s
app.kubernetes.io/managed-by=prometheus-operator
app.kubernetes.io/name=prometheus
app.kubernetes.io/part-of=openshift-monitoring
app.kubernetes.io/version=2.52.0
apps.kubernetes.io/pod-index=0
controller-revision-hash=prometheus-k8s-856d7759cc
operator.prometheus.io/name=k8s
operator.prometheus.io/shard=0
prometheus=k8s
statefulset.kubernetes.io/pod-name=prometheus-k8s-0
Annotations: k8s.ovn.org/pod-networks:
{"default":{"ip_addresses":["10.130.2.8/23"],"mac_address":"0a:58:0a:82:02:08","gateway_ips":["10.130.2.1"],"routes":[{"dest":"10.128.0.0/...
kubectl.kubernetes.io/default-container: prometheus
openshift.io/required-scc: nonroot
openshift.io/scc: nonroot
Status: Pending
IP:
IPs: <none>
Controlled By: StatefulSet/prometheus-k8s
Init Containers:
init-config-reloader:
Container ID:
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:356d4ce991042a2affc27988c328a2ce686a52132c3ca1b630bce6b7965e8f90
Image ID:
Port: 8080/TCP
Host Port: 0/TCP
Command:
/bin/prometheus-config-reloader
Args:
--watch-interval=0
--listen-address=:8080
--config-file=/etc/prometheus/config/prometheus.yaml.gz
--config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
--watched-dir=/etc/prometheus/rules/prometheus-k8s-rulefiles-0
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 1m
memory: 10Mi
Environment:
POD_NAME: prometheus-k8s-0 (v1:metadata.name)
SHARD: 0
Mounts:
/etc/prometheus/config from config (rw)
/etc/prometheus/config_out from config-out (rw)
/etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw)
/etc/prometheus/web_config/web-config.yaml from web-config (ro,path="web-config.yaml")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fqmjd (ro)
Containers:
prometheus:
Container ID:
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:87b350932e17e0b93bf337c1e6923b39b92ba21df119a9de8c3c8bd603d00e44
Image ID:
Port: <none>
Host Port: <none>
Args:
--web.console.templates=/etc/prometheus/consoles
--web.console.libraries=/etc/prometheus/console_libraries
--config.file=/etc/prometheus/config_out/prometheus.env.yaml
--web.enable-lifecycle
--web.external-url=https://console-openshift-console.apps.liqcui-sdn2ovn.qe.devcluster.openshift.com/monitoring
--web.route-prefix=/
--web.listen-address=127.0.0.1:9090
--storage.tsdb.retention.time=15d
--storage.tsdb.path=/prometheus
--web.config.file=/etc/prometheus/web_config/web-config.yaml
--scrape.timestamp-tolerance=15ms
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 70m
memory: 1Gi
Liveness: exec [sh -c if [ -x "$(command -v curl)" ]; then exec curl --fail http://localhost:9090/-/healthy; elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/healthy; else exit 1; fi] delay=0s timeout=3s period=5s #success=1 #failure=6
Readiness: exec [sh -c if [ -x "$(command -v curl)" ]; then exec curl --fail http://localhost:9090/-/ready; elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/ready; else exit 1; fi] delay=0s timeout=3s period=5s #success=1 #failure=3
Startup: exec [sh -c if [ -x "$(command -v curl)" ]; then exec curl --fail http://localhost:9090/-/ready; elif [ -x "$(command -v wget)" ]; then exec wget -q -O /dev/null http://localhost:9090/-/ready; else exit 1; fi] delay=0s timeout=3s period=60s #success=1 #failure=60
Environment: <none>
Mounts:
/etc/pki/ca-trust/extracted/pem/ from prometheus-trusted-ca-bundle (rw)
/etc/prometheus/certs from tls-assets (ro)
/etc/prometheus/config_out from config-out (ro)
/etc/prometheus/configmaps/kubelet-serving-ca-bundle from configmap-kubelet-serving-ca-bundle (ro)
/etc/prometheus/configmaps/metrics-client-ca from configmap-metrics-client-ca (ro)
/etc/prometheus/configmaps/serving-certs-ca-bundle from configmap-serving-certs-ca-bundle (ro)
/etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw)
/etc/prometheus/secrets/kube-rbac-proxy from secret-kube-rbac-proxy (ro)
/etc/prometheus/secrets/metrics-client-certs from secret-metrics-client-certs (ro)
/etc/prometheus/secrets/prometheus-k8s-kube-rbac-proxy-web from secret-prometheus-k8s-kube-rbac-proxy-web (ro)
/etc/prometheus/secrets/prometheus-k8s-thanos-sidecar-tls from secret-prometheus-k8s-thanos-sidecar-tls (ro)
/etc/prometheus/secrets/prometheus-k8s-tls from secret-prometheus-k8s-tls (ro)
/etc/prometheus/web_config/web-config.yaml from web-config (ro,path="web-config.yaml")
/prometheus from prometheus-k8s-db (rw,path="prometheus-db")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fqmjd (ro)
config-reloader:
Container ID:
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:356d4ce991042a2affc27988c328a2ce686a52132c3ca1b630bce6b7965e8f90
Image ID:
Port: <none>
Host Port: <none>
Command:
/bin/prometheus-config-reloader
Args:
--listen-address=localhost:8080
--web-config-file=/etc/prometheus/web_config/web-config.yaml
--reload-url=http://localhost:9090/-/reload
--config-file=/etc/prometheus/config/prometheus.yaml.gz
--config-envsubst-file=/etc/prometheus/config_out/prometheus.env.yaml
--watched-dir=/etc/prometheus/rules/prometheus-k8s-rulefiles-0
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 1m
memory: 10Mi
Environment:
POD_NAME: prometheus-k8s-0 (v1:metadata.name)
SHARD: 0
Mounts:
/etc/prometheus/config from config (rw)
/etc/prometheus/config_out from config-out (rw)
/etc/prometheus/rules/prometheus-k8s-rulefiles-0 from prometheus-k8s-rulefiles-0 (rw)
/etc/prometheus/web_config/web-config.yaml from web-config (ro,path="web-config.yaml")
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fqmjd (ro)
thanos-sidecar:
Container ID:
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:993e54a6864d7fe7fa61d3faf5a98a4438dec0a447b0d1e837cc92ea1a0ce16e
Image ID:
Ports: 10902/TCP, 10901/TCP
Host Ports: 0/TCP, 0/TCP
Args:
sidecar
--prometheus.url=http://localhost:9090/
--tsdb.path=/prometheus
--http-address=127.0.0.1:10902
--grpc-server-tls-cert=/etc/tls/grpc/server.crt
--grpc-server-tls-key=/etc/tls/grpc/server.key
--grpc-server-tls-client-ca=/etc/tls/grpc/ca.crt
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 1m
memory: 25Mi
Environment: <none>
Mounts:
/etc/tls/grpc from secret-grpc-tls (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fqmjd (ro)
kube-rbac-proxy-web:
Container ID:
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:70458d010bd9f4e9c43b6452fe79e529af926deab2714e10ba1366789ec15d9f
Image ID:
Port: 9091/TCP
Host Port: 0/TCP
Args:
--secure-listen-address=0.0.0.0:9091
--upstream=http://127.0.0.1:9090
--config-file=/etc/kube-rbac-proxy/config.yaml
--tls-cert-file=/etc/tls/private/tls.crt
--tls-private-key-file=/etc/tls/private/tls.key
--tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
--ignore-paths=/-/healthy,/-/ready
--tls-min-version=VersionTLS12
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 1m
memory: 15Mi
Environment: <none>
Mounts:
/etc/kube-rbac-proxy from secret-prometheus-k8s-kube-rbac-proxy-web (rw)
/etc/tls/private from secret-prometheus-k8s-tls (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fqmjd (ro)
kube-rbac-proxy:
Container ID:
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:70458d010bd9f4e9c43b6452fe79e529af926deab2714e10ba1366789ec15d9f
Image ID:
Port: 9092/TCP
Host Port: 0/TCP
Args:
--secure-listen-address=0.0.0.0:9092
--upstream=http://127.0.0.1:9090
--allow-paths=/metrics,/federate
--config-file=/etc/kube-rbac-proxy/config.yaml
--tls-cert-file=/etc/tls/private/tls.crt
--tls-private-key-file=/etc/tls/private/tls.key
--client-ca-file=/etc/tls/client/client-ca.crt
--tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
--tls-min-version=VersionTLS12
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 1m
memory: 15Mi
Environment: <none>
Mounts:
/etc/kube-rbac-proxy from secret-kube-rbac-proxy (rw)
/etc/tls/client from configmap-metrics-client-ca (ro)
/etc/tls/private from secret-prometheus-k8s-tls (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fqmjd (ro)
kube-rbac-proxy-thanos:
Container ID:
Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:70458d010bd9f4e9c43b6452fe79e529af926deab2714e10ba1366789ec15d9f
Image ID:
Port: 10903/TCP
Host Port: 0/TCP
Args:
--secure-listen-address=[$(POD_IP)]:10903
--upstream=http://127.0.0.1:10902
--tls-cert-file=/etc/tls/private/tls.crt
--tls-private-key-file=/etc/tls/private/tls.key
--client-ca-file=/etc/tls/client/client-ca.crt
--config-file=/etc/kube-rbac-proxy/config.yaml
--tls-cipher-suites=TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384,TLS_ECDHE_ECDSA_WITH_CHACHA20_POLY1305_SHA256,TLS_ECDHE_RSA_WITH_CHACHA20_POLY1305_SHA256
--allow-paths=/metrics
--tls-min-version=VersionTLS12
State: Waiting
Reason: PodInitializing
Ready: False
Restart Count: 0
Requests:
cpu: 1m
memory: 10Mi
Environment:
POD_IP: (v1:status.podIP)
Mounts:
/etc/kube-rbac-proxy from secret-kube-rbac-proxy (ro)
/etc/tls/client from configmap-metrics-client-ca (ro)
/etc/tls/private from secret-prometheus-k8s-thanos-sidecar-tls (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-fqmjd (ro)
Conditions:
Type Status
PodReadyToStartContainers False
Initialized False
Ready False
ContainersReady False
PodScheduled True
Volumes:
prometheus-k8s-db:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: prometheus-k8s-db-prometheus-k8s-0
ReadOnly: false
config:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-k8s
Optional: false
tls-assets:
Type: Projected (a volume that contains injected data from multiple sources)
SecretName: prometheus-k8s-tls-assets-0
SecretOptionalName: <nil>
config-out:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
secret-prometheus-k8s-tls:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-k8s-tls
Optional: false
secret-prometheus-k8s-thanos-sidecar-tls:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-k8s-thanos-sidecar-tls
Optional: false
secret-kube-rbac-proxy:
Type: Secret (a volume populated by a Secret)
SecretName: kube-rbac-proxy
Optional: false
secret-prometheus-k8s-kube-rbac-proxy-web:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-k8s-kube-rbac-proxy-web
Optional: false
secret-metrics-client-certs:
Type: Secret (a volume populated by a Secret)
SecretName: metrics-client-certs
Optional: false
configmap-serving-certs-ca-bundle:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: serving-certs-ca-bundle
Optional: false
configmap-kubelet-serving-ca-bundle:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kubelet-serving-ca-bundle
Optional: false
configmap-metrics-client-ca:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: metrics-client-ca
Optional: false
prometheus-k8s-rulefiles-0:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus-k8s-rulefiles-0
Optional: false
web-config:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-k8s-web-config
Optional: false
prometheus-trusted-ca-bundle:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: prometheus-trusted-ca-bundle
Optional: false
secret-grpc-tls:
Type: Secret (a volume populated by a Secret)
SecretName: prometheus-k8s-grpc-tls-9sg4kpkjnt4o0
Optional: false
kube-api-access-fqmjd:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
ConfigMapName: openshift-service-ca.crt
ConfigMapOptional: <nil>
QoS Class: Burstable
Node-Selectors: node-role.kubernetes.io/infra=
Tolerations: node-role.kubernetes.io/infra=reserved:NoSchedule
node-role.kubernetes.io/infra=reserved:NoExecute
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 149m default-scheduler 0/31 nodes are available: 2 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 2 node(s) had volume node affinity conflict, 22 node(s) didn't match Pod's node affinity/selector, 5 node(s) were unschedulable. preemption: 0/31 nodes are available: 31 Preemption is not helpful for scheduling.
Warning FailedScheduling 144m default-scheduler 0/31 nodes are available: 2 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 2 node(s) had volume node affinity conflict, 21 node(s) didn't match Pod's node affinity/selector, 6 node(s) were unschedulable. preemption: 0/31 nodes are available: 31 Preemption is not helpful for scheduling.
Warning FailedScheduling 138m default-scheduler 0/31 nodes are available: 2 node(s) had untolerated taint {node-role.kubernetes.io/master: }, 2 node(s) had volume node affinity conflict, 21 node(s) didn't match Pod's node affinity/selector, 6 node(s) were unschedulable. preemption: 0/31 nodes are available: 31 Preemption is not helpful for scheduling.
Normal Scheduled 133m default-scheduler Successfully assigned openshift-monitoring/prometheus-k8s-0 to ip-10-0-4-67.us-east-2.compute.internal
Warning FailedAttachVolume 111m (x19 over 133m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-622bca2b-5053-4d05-ac8c-95c820ace8f3" : volume attachment is being deleted
Warning FailedAttachVolume 3m58s (x60 over 110m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-622bca2b-5053-4d05-ac8c-95c820ace8f3" : volume attachment is being deleted
[ocpadmin@ip-10-0-0-179 ~]$ oc -n openshift-monitoring get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
alertmanager-main-db-alertmanager-main-0 Bound pvc-cc54ebf1-3460-42ad-9b5f-91e350adfb83 2Gi RWO gp3-csi <unset> 5h48m
alertmanager-main-db-alertmanager-main-1 Bound pvc-6a8e3453-bd96-4c80-95c7-d1202c1f49e3 2Gi RWO gp3-csi <unset> 5h48m
prometheus-k8s-db-prometheus-k8s-0 Bound pvc-622bca2b-5053-4d05-ac8c-95c820ace8f3 100Gi RWO gp3-csi <unset> 5h48m
prometheus-k8s-db-prometheus-k8s-1 Bound pvc-5185de21-13da-4b6e-8dff-b9a6d39db1c0 100Gi RWO gp3-csi <unset> 5h48m
[ocpadmin@ip-10-0-0-179 ~]$ oc -n openshift-monitoring get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS VOLUMEATTRIBUTESCLASS REASON AGE
pvc-5185de21-13da-4b6e-8dff-b9a6d39db1c0 100Gi RWO Delete Bound openshift-monitoring/prometheus-k8s-db-prometheus-k8s-1 gp3-csi <unset> 5h48m
pvc-622bca2b-5053-4d05-ac8c-95c820ace8f3 100Gi RWO Delete Bound openshift-monitoring/prometheus-k8s-db-prometheus-k8s-0 gp3-csi <unset> 5h48m
pvc-6a8e3453-bd96-4c80-95c7-d1202c1f49e3 2Gi RWO Delete Bound openshift-monitoring/alertmanager-main-db-alertmanager-main-1 gp3-csi <unset> 5h48m
pvc-cc54ebf1-3460-42ad-9b5f-91e350adfb83 2Gi RWO Delete Bound openshift-monitoring/alertmanager-main-db-alertmanager-main-0 gp3-csi <unset> 5h48m
[ocpadmin@ip-10-0-0-179 ~]${code}
Version-Release number of selected component (if applicable):
{code:none}
How reproducible:
Steps to Reproduce:
1. Creating OCP with SDN on AWS, the instance type is c5n.metal, scale out to 24 worker node and 3 infra node and 1 workload
2. Migration SDN to OVN using oc patch Network.config.openshift.io cluster --type='merge' --patch '{"metadata":{"annotations":{"network.openshift.io/network-type-migration":""}},"spec":{"networkType":"OVNKubernetes"}}'
Actual results:
The pod prometheus-k8s-0 failed to startup due to FailedAttachVolume 3m58s (x60 over 110m) attachdetach-controller AttachVolume.Attach failed for volume "pvc-622bca2b-5053-4d05-ac8c-95c820ace8f3" : volume attachment is being deleted, it will block infra node drain node, and block migrate to ovn from sdn
Expected results:
The pod prometheus-k8s-0 startup properly
Additional info:
- is duplicated by
-
OCPBUGS-55472 Stray Volume Attachment prevents pod starts
-
- Closed
-