-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
False
-
-
False
-
-
Kubelet logs, from must-gather, are reporting many TLS connection errors on OPCT executions.
Must gather archives from regular e2e on External/AWS and vSphere/vSphere have been checked on CI and the issue is not there.
Archives from External/AWS, provisioned by OCP CI using UPI, and External/OCI, provisioned manually by Oracle using Assisted Installer, are reporting same issues on OPCT environments.
There are no evidences if the issues are specifically from Sonobuoy, but the initial investigations and tooling used only in OPCT points to it. Further triage need to be done to expand this bug.
See logs:
- kubelet log sample:
$ grep -r 'TLS handshake error from' 202405302052_sonobuoy_3eef31f1-4421-4970-8fc4-3b103e0baada_must-gather | grep kubelet | tail -n 5
/path/to/must-gather/host_service_logs/masters/kubelet_service.log:May 31 00:05:45.708701 cp2.private2.openshiftvcn.oraclevcn.com kubenswrapper[6309]: I0531 00:05:45.708574 6309 log.go:194] http: TLS handshake error from 10.0.39.135:33690: EOF
/path/to/must-gather/host_service_logs/masters/kubelet_service.log:May 31 00:05:45.708894 cp1.private2.openshiftvcn.oraclevcn.com kubenswrapper[6402]: I0531 00:05:45.708698 6402 log.go:194] http: TLS handshake error from 10.0.38.103:43802: EOF
/path/to/must-gather/host_service_logs/masters/kubelet_service.log:May 31 00:05:45.709733 cp3.private2.openshiftvcn.oraclevcn.com kubenswrapper[6321]: I0531 00:05:45.709695 6321 log.go:194] http: TLS handshake error from 10.0.36.76:41146: EOF
/path/to/must-gather/host_service_logs/masters/kubelet_service.log:May 31 00:05:50.707730 cp2.private2.openshiftvcn.oraclevcn.com kubenswrapper[6309]: I0531 00:05:50.707692 6309 log.go:194] http: TLS handshake error from 10.0.39.135:50026: EOF
/path/to/must-gather/host_service_logs/masters/kubelet_service.log:May 31 00:05:50.708896 cp1.private2.openshiftvcn.oraclevcn.com kubenswrapper[6402]: I0531 00:05:50.708857 6402 log.go:194] http: TLS handshake error from 10.0.38.103:51622: EOF
- Kubelet log IP summary
External/OCI/OPCT
$ grep -r 'TLS handshake error from' ${MUST_GATHER} | awk -F'TLS handshake error from' '{print$2}' | awk -F':' '{print$1}' | sort | uniq -c | sort -n 1 ' error", 28 10.128.0.88 34 10.128.0.2 520 127.0.0.1 815 10.128.2.2 2273 10.0.38.103 2275 10.0.39.135 2282 10.0.36.76 2409 10.130.0.2 3381 10.129.2.2
External/AWS/OPCT
$ grep -r 'TLS handshake error from' ${MUST_GATHER} | awk -F'TLS handshake error from' '{print$2}' | awk -F':' '{print$1}' | sort | uniq -c | sort -n 1 ' error", 2 10.0.10.182 2 10.129.0.2 2 10.131.0.12 28 10.130.0.61 187 127.0.0.1 1783 10.0.48.66 2350 10.0.72.53 2351 10.0.59.76 4203 10.131.0.2 4321 10.128.2.2 $ echo $MUST_GATHER /home/mtulio/opct/partners/oci/BM/202405302052/ci-4.15-opct-external-aws-ccm-202402132052_must-gather
For reference, the regular e2e reports almost 4 times less errors, following the same pattern:
vSphere/vSphere/e2e:
1 ' error", 2 10.131.0.8 28 10.128.0.80 71 127.0.0.1 631 10.177.158.105 631 10.177.158.80 632 10.177.158.25 1216 10.128.2.2 1404 10.131.0.2
External/AWS/e2e:
$ grep -r 'TLS handshake error from' ${MUST_GATHER} | awk -F'TLS handshake error from' '{print$2}' | awk -F':' '{print$1}' | sort | uniq -c | sort -n 1 10.0.8.75 1 ' error", 2 10.129.0.2 2 10.131.0.8 26 10.129.0.61 99 127.0.0.1 546 10.0.50.135 546 10.0.51.110 547 10.0.77.221 3677 10.128.2.2 4240 10.131.0.2
The problem was caught when analyzing OCI BM results on OPCT-287