-
Bug
-
Resolution: Done
-
Normal
-
None
-
4.13
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Moderate
-
No
-
None
-
None
-
MON Sprint 244
-
1
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
TargetDown and KubeletDown alerts are firing in a cluster that has been upgraded from 4.12 to 4.13 The bootstrap configmap is deleted from the kube-system namespace at 4.12
Version-Release number of selected component (if applicable):
How reproducible:
always
Steps to Reproduce:
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.12.26 True False 106m Cluster version is 4.12.26
$ oc get configmap -n kube-system bootstrap
NAME DATA AGE
bootstrap 1 118m
$ oc delete configmap -n kube-system bootstrap
configmap "bootstrap" deleted
$ oc get configmap -n kube-system bootstrap
Error from server (NotFound): configmaps "bootstrap" not found
$ oc -n openshift-config patch cm admin-acks --patch '{"data":{"ack-4.12-kube-1.26-api-removals-in-4.13":"true"}}' --type=merge
configmap/admin-acks patchedoc get clusterversion
Upgrade from 4.12 to 4.13
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.13.11 True False 15m Cluster version is 4.13.11
$ oc get secret metrics-client-certs -o yaml | grep crt | awk '{print $2}' | base64 -d > secret.pem
$ openssl x509 -in secret.pem -text -noout
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
d7:9e:f0:6a:dd:86:28:b9:b4:45:95:76:af:ca:8b:1d
Signature Algorithm: sha256WithRSAEncryption
Issuer: OU = openshift, CN = kubelet-signer
Validity
Not Before: Oct 26 13:59:46 2023 GMT
Not After : Oct 27 13:48:08 2023 GMT
Subject: CN = system:serviceaccount:openshift-monitoring:prometheus-k8s
Subject Public Key Info:
Public Key Algorithm: id-ecPublicKey
Public-Key: (256 bit)
pub:
04:50:a9:d9:d4:36:66:29:fd:ca:51:22:b3:41:cf:
e1:bc:3e:63:bb:fe:84:82:3a:23:f2:74:99:76:d4:
a7:09:fa:df:a8:25:81:a3:fc:94:f3:6f:2b:85:3b:
90:9f:89:e1:01:68:6f:55:49:16:ee:7e:7a:e4:5a:
b5:d2:5d:9c:34
ASN1 OID: prime256v1
NIST CURVE: P-256
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
X509v3 Extended Key Usage:
TLS Web Client Authentication
X509v3 Basic Constraints: critical
CA:FALSE
X509v3 Authority Key Identifier:
E5:80:A2:00:E1:8D:9F:ED:78:5F:EF:17:BA:84:A6:1B:33:F4:32:B3
Signature Algorithm: sha256WithRSAEncryption
Signature Value:
42:c8:e3:0f:7a:01:31:52:87:8d:e7:30:36:f4:c6:e1:d6:09:
08:b1:d5:20:d9:1e:b3:e9:e6:65:a5:2c:fe:79:ad:f9:26:39:
71:90:4f:16:8e:e0:05:df:2c:29:ec:f5:9f:a5:18:3f:3d:9e:
5f:38:d6:9c:a7:f8:59:f6:c8:20:85:1b:26:1d:72:4a:dd:09:
2d:7d:2f:1f:e4:1c:bc:df:e2:31:78:a9:f0:f8:de:11:cc:da:
66:87:d5:c5:8b:9e:73:15:96:d3:b7:38:1b:e2:69:2d:6e:4d:
4d:fc:1f:87:e5:e4:2e:5a:e2:6d:77:9a:2f:ce:60:7c:27:4c:
25:b0:94:37:55:3e:c6:46:e0:b7:15:1a:2b:5a:f1:35:e6:8a:
c1:19:ed:39:aa:79:ba:7b:1f:d6:2f:5a:b1:b2:e1:25:c3:b8:
49:f8:0a:27:b6:67:87:43:b1:be:e2:48:48:ec:df:60:75:2d:
7f:bb:3a:c1:0b:5a:5b:bc:38:29:bc:09:07:cd:2f:b8:8e:13:
88:ee:71:45:fa:d5:f9:03:19:b0:66:5b:c6:24:4b:2d:31:8b:
ac:d3:06:79:16:47:ab:d2:37:5c:f7:45:b2:66:1c:53:bb:0b:
92:dd:a6:84:22:a4:fc:5c:35:76:64:1d:f5:16:0b:ce:29:4a:
ef:9d:a1:cb
$ oc delete secret metrics-client-certs
secret "metrics-client-certs" deleted
$ oc get secrets metrics-client-certs
NAME TYPE DATA AGE
metrics-client-certs Opaque 2 44s
$ oc get secret metrics-client-certs -o yaml | grep crt | awk '{print $2}' | base64 -d > secret.pem
$ openssl x509 -in secret.pem -text -noout
Certificate:
Data:
Version: 3 (0x2)
Serial Number:
6c:b3:48:68:f2:7e:27:77:13:bc:7b:0e:e0:f5:2b:6d
Signature Algorithm: sha256WithRSAEncryption
Issuer: CN = kube-csr-signer_@1698329032
Validity
Not Before: Oct 26 17:26:27 2023 GMT
Not After : Oct 27 13:48:08 2023 GMT
Subject: CN = system:serviceaccount:openshift-monitoring:prometheus-k8s
Subject Public Key Info:
Public Key Algorithm: id-ecPublicKey
Public-Key: (256 bit)
pub:
04:20:dd:2f:62:c3:1e:d6:bd:f1:dc:c8:f1:cb:22:
31:27:5c:30:d8:01:19:93:da:ed:20:2e:4f:5d:f9:
5d:02:95:f4:10:f0:79:bf:b6:be:b6:54:66:17:29:
a4:61:62:ec:8b:cf:6f:2f:01:4e:8f:71:3b:cb:9d:
24:84:c5:52:08
ASN1 OID: prime256v1
NIST CURVE: P-256
X509v3 extensions:
X509v3 Key Usage: critical
Digital Signature, Key Encipherment
X509v3 Extended Key Usage:
TLS Web Client Authentication
X509v3 Basic Constraints: critical
CA:FALSE
X509v3 Authority Key Identifier:
77:7A:20:8D:D4:ED:A8:EE:B1:0B:94:87:C8:A7:8E:1C:C0:B3:3E:B3
Signature Algorithm: sha256WithRSAEncryption
Signature Value:
28:a6:c9:e0:22:ad:af:15:04:89:23:7f:fc:92:41:c2:b3:29:
5d:c6:26:ac:d5:6b:71:e1:44:45:37:90:35:04:9b:6e:7e:ae:
43:6d:3e:8e:c3:5f:25:d7:79:62:6c:e6:7f:6a:10:a6:e4:39:
1e:1f:34:6e:7d:63:02:33:a7:32:b6:29:bc:f6:c3:bf:e6:47:
52:90:03:b8:45:30:46:a8:f2:32:c0:c2:31:68:2e:94:b4:4e:
51:10:7f:e0:fd:ff:33:94:1a:d6:96:dd:0c:ce:29:90:c6:ca:
fc:88:62:02:0e:4a:01:6f:48:54:0a:ba:1d:0f:76:df:ce:ff:
8c:91:cf:16:93:e8:59:6d:ef:24:31:bc:53:cf:3f:df:bf:7b:
96:ea:f6:30:de:98:32:47:be:46:8a:f8:e2:44:84:57:47:d4:
2c:cc:1a:f9:7a:3f:a0:80:90:1f:eb:75:35:9d:84:2f:a5:da:
ff:9e:5e:1f:5a:f4:cc:7a:68:15:d7:e7:2b:38:af:8b:d5:ed:
00:7a:02:0a:07:8f:8d:8f:0b:50:9f:81:a2:d2:35:79:81:11:
04:e5:e2:95:7b:8d:fa:a6:97:1c:0f:4f:ab:55:ce:50:1f:6f:
d7:8d:76:06:20:39:d0:b3:ec:d8:a3:25:f0:db:88:74:6b:1f:
61:84:8b:74
oc get configmap -n kube-system bootstrap
Error from server (NotFound): configmaps "bootstrap" not found
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s --data-urlencode "query=ALERTS{alertname='KubeletDown'}" 'http://localhost:9090/api/v1/query?' | jq
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"__name__": "ALERTS",
"alertname": "KubeletDown",
"alertstate": "pending",
"namespace": "kube-system",
"severity": "critical"
},
"value": [
1698342150.849,
"1"
]
}
]
}
}
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s --data-urlencode "query=up{job='kubelet',metrics_path='/metrics'}" 'http://localhost:9090/api/v1/query?' | jq
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"__name__": "up",
"endpoint": "https-metrics",
"instance": "10.0.90.25:10250",
"job": "kubelet",
"metrics_path": "/metrics",
"namespace": "kube-system",
"node": "worker-0.nigeltest.lab.upshift.rdu2.redhat.com",
"service": "kubelet"
},
"value": [
1698342160.462,
"0"
]
},
{
"metric": {
"__name__": "up",
"endpoint": "https-metrics",
"instance": "10.0.91.12:10250",
"job": "kubelet",
"metrics_path": "/metrics",
"namespace": "kube-system",
"node": "worker-1.nigeltest.lab.upshift.rdu2.redhat.com",
"service": "kubelet"
},
"value": [
1698342160.462,
"0"
]
},
{
"metric": {
"__name__": "up",
"endpoint": "https-metrics",
"instance": "10.0.91.159:10250",
"job": "kubelet",
"metrics_path": "/metrics",
"namespace": "kube-system",
"node": "worker-2.nigeltest.lab.upshift.rdu2.redhat.com",
"service": "kubelet"
},
"value": [
1698342160.462,
"0"
]
},
{
"metric": {
"__name__": "up",
"endpoint": "https-metrics",
"instance": "10.0.91.228:10250",
"job": "kubelet",
"metrics_path": "/metrics",
"namespace": "kube-system",
"node": "master-1.nigeltest.lab.upshift.rdu2.redhat.com",
"service": "kubelet"
},
"value": [
1698342160.462,
"0"
]
},
{
"metric": {
"__name__": "up",
"endpoint": "https-metrics",
"instance": "10.0.92.103:10250",
"job": "kubelet",
"metrics_path": "/metrics",
"namespace": "kube-system",
"node": "master-2.nigeltest.lab.upshift.rdu2.redhat.com",
"service": "kubelet"
},
"value": [
1698342160.462,
"0"
]
},
{
"metric": {
"__name__": "up",
"endpoint": "https-metrics",
"instance": "10.0.95.36:10250",
"job": "kubelet",
"metrics_path": "/metrics",
"namespace": "kube-system",
"node": "master-0.nigeltest.lab.upshift.rdu2.redhat.com",
"service": "kubelet"
},
"value": [
1698342160.462,
"0"
]
}
]
}
}
$ oc -n openshift-monitoring exec -c prometheus prometheus-k8s-0 -- curl -s --data-urlencode "query=ALERTS{alertname='KubeletDown'}" 'http://localhost:9090/api/v1/query?' | jq
{
"status": "success",
"data": {
"resultType": "vector",
"result": [
{
"metric": {
"__name__": "ALERTS",
"alertname": "KubeletDown",
"alertstate": "firing",
"namespace": "kube-system",
"severity": "critical"
},
"value": [
1698343484.200,
"1"
]
}
]
}
}
Actual results:
kubeletdown alerts fire in the cluster as the targets have disapeared
Expected results:
that the targets remain up
Additional info:
https://issues.redhat.com/browse/OCPBUGS-4521 https://issues.redhat.com/browse/OCPBUGS-4521?focusedId=21376346&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-21376346 https://issues.redhat.com/browse/OCPBUGS-17926 https://issues.redhat.com/browse/OCPBUGS-5888
- is related to
-
OCPBUGS-4521 all kubelet targets are down after a few hours
-
- Closed
-
-
OCPBUGS-17926 Kubelet targets down after upgrade from 4.12 to 4.13
-
- Closed
-
- links to