-
Bug
-
Resolution: Done
-
Undefined
-
None
-
None
-
2
-
False
-
None
-
False
-
-
-
-
Tracing Sprint # 239, Tracing Sprint # 240, Tracing Sprint # 241, Tracing Sprint # 242, Tracing Sprint # 243, Tracing Sprint # 244
Issue: jaeger/elasticsearch pods crashloop status, with reported customer errors:
~~~
Before I restarted any pods, all three elasticsearch pods showed logs like so:
2023-06-20T22:25:02.243702305Z [2023-06-20T22:25:02,243][ERROR][c.a.o.s.s.h.n.OpenDistroSecuritySSLNettyHttpServerTransport] [elasticsearch-cdm-istiosystemjaeger-1] SSL Problem Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)
2023-06-20T22:25:02.243702305Z javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)
I restarted the 3rd elasticsearch pod, elasticsearch-cdm-istiosystemjaeger-3-8db99fd54-wffnw, and started getting errors like so:
2023-06-20T22:25:02.243702305Z [2023-06-20T22:25:02,243][ERROR][c.a.o.s.s.h.n.OpenDistroSecuritySSLNettyHttpServerTransport] [elasticsearch-cdm-istiosystemjaeger-1] SSL Problem Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)
2023-06-20T22:25:02.243702305Z javax.net.ssl.SSLHandshakeException: Insufficient buffer remaining for AEAD cipher fragment (2). Needs to be more than tag size (16)
And:
2023-06-21T01:00:18.168137633Z [2023-06-21T01:00:18,161][ERROR][c.a.o.s.s.t.OpenDistroSecuritySSLNettyTransport] [elasticsearch-cdm-istiosystemjaeger-2] SSL Problem PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
2023-06-21T01:00:18.168137633Z javax.net.ssl.SSLHandshakeException: PKIX path validation failed: java.security.cert.CertPathValidatorException: validity check failed
2023-06-21T01:00:18.168137633Z at sun.security.ssl.Alert.createSSLException(Alert.java:131) ~[?:?]
And:
2023-06-21T01:00:19.010710594Z [2023-06-21T01:00:19,010][WARN ][o.e.d.z.ZenDiscovery ] [elasticsearch-cdm-istiosystemjaeger-2] not enough master nodes discovered during pinging (found [[Candidate{node=
{elasticsearch-cdm-istiosystemjaeger-2}{-taatlZURZ6jxxPZC7KNWw}
{FCP0cg0SRwSmSqAkwaOsfw} {x.x.x.45} {x.x.x.45:9300}, clusterStateVersion=-1}]], but needed [2]), pinging again
~~~
Triggered by: Expired Certificates (see below)
~~~
/etc/openshift/elasticsearch/secret/logging-es.crt
Validity
Not Before: Jun 14 05:47:02 2023 GMT
Not After : Jun 13 05:47:02 2025 GMT
/etc/openshift/elasticsearch/secret/elasticsearch.crt
Validity
Not Before: Jun 14 05:47:01 2023 GMT
Not After : Jun 13 05:47:01 2025 GMT
/etc/openshift/elasticsearch/secret/..2023_06_21_01_00_03.1819449341/logging-es.crt
Validity
Not Before: Jun 14 05:47:02 2023 GMT
Not After : Jun 13 05:47:02 2025 GMT
/etc/openshift/elasticsearch/secret/..2023_06_21_01_00_03.1819449341/elasticsearch.crt
Validity
Not Before: Jun 14 05:47:01 2023 GMT
Not After : Jun 13 05:47:01 2025 GMT
/etc/elasticsearch/secret/elasticsearch.crt
Validity
Not Before: Jun 14 05:47:01 2023 GMT
Not After : Jun 13 05:47:01 2025 GMT
/etc/elasticsearch/secret/logging-es.crt
Validity
Not Before: Jun 14 05:47:02 2023 GMT
Not After : Jun 13 05:47:02 2025 GMT
/run/secrets/kubernetes.io/serviceaccount/ca.crt
Validity
Not Before: Apr 1 20:31:23 2021 GMT
Not After : Mar 30 20:31:23 2031 GMT
/run/secrets/kubernetes.io/serviceaccount/service-ca.crt
Validity
Not Before: May 31 21:01:25 2023 GMT
Not After : Jul 29 21:01:26 2025 GMT
/run/secrets/kubernetes.io/serviceaccount/..2023_06_21_01_00_03.1972118856/service-ca.crt
Validity
Not Before: May 31 21:01:25 2023 GMT
Not After : Jul 29 21:01:26 2025 GMT
/run/secrets/kubernetes.io/serviceaccount/..2023_06_21_01_00_03.1972118856/ca.crt
Validity
Not Before: Apr 1 20:31:23 2021 GMT
Not After : Mar 30 20:31:23 2031 GMT
And in a pod I hadn't restarted:
oc exec -c elasticsearch $p1 – sh -c 'for i in $(find /etc /run -name *.crt -o -name *,.cert); do openssl x509 -noout -text -in $i| grep -i -A2 -e validity|grep -i "not after : jun 14 " && echo $i; done'
unable to load certificate
139852364314432:error:0909006C:PEM routines:get_name:no start line:crypto/pem/pem_lib.c:745:Expecting: TRUSTED CERTIFICATE
Not After : Jun 14 05:16:32 2023 GMT
/etc/elasticsearch/secret/elasticsearch.crt
Not After : Jun 14 05:16:32 2023 GMT
/etc/elasticsearch/secret/logging-es.crt
command terminated with exit code 1
$ oc get po $p1
NAME READY STATUS RESTARTS AGE
elasticsearch-cdm-istiosystemjaeger-1-85c49b687c-4nkqc 1/2 Running 0 16d
~~~
Workaround:
Delete the affected pods, allow them to restart with the new certs, pods come up OK.
data gathers:
must-gather taken during issue observation no actions taken:
https://attachments.access.redhat.com/hydra/rest/cases/03543308/attachments/3b57aa03-80dc-42d9-8643-5f7a0416f35a?usePresignedUrl=true
must-gather taken after an issue: https://attachments.access.redhat.com/hydra/rest/cases/03543308/attachments/79b6c69c-537c-4c01-99f3-9bf805e09cc9?usePresignedUrl=true