Uploaded image for project: 'OPCT - OpenShift Provider Compatibility Tool'
  1. OPCT - OpenShift Provider Compatibility Tool
  2. OPCT-288

[Bug] Test environment / kubelet logs are reporting many TLS connection issues

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      Kubelet logs, from must-gather, are reporting many TLS connection errors on OPCT executions.

      Must gather archives from regular e2e on External/AWS and vSphere/vSphere have been checked on CI and the issue is not there.

      Archives from External/AWS, provisioned by OCP CI using UPI, and External/OCI, provisioned manually by Oracle using Assisted Installer, are reporting same issues on OPCT environments.

      There are no evidences if the issues are specifically from Sonobuoy, but the initial investigations and tooling used only in OPCT points to it. Further triage need to be done to expand this bug.

      See logs:

      • kubelet log sample: 

       

      $ grep -r 'TLS handshake error from' 202405302052_sonobuoy_3eef31f1-4421-4970-8fc4-3b103e0baada_must-gather | grep kubelet | tail -n 5
      /path/to/must-gather/host_service_logs/masters/kubelet_service.log:May 31 00:05:45.708701 cp2.private2.openshiftvcn.oraclevcn.com kubenswrapper[6309]: I0531 00:05:45.708574    6309 log.go:194] http: TLS handshake error from 10.0.39.135:33690: EOF
      /path/to/must-gather/host_service_logs/masters/kubelet_service.log:May 31 00:05:45.708894 cp1.private2.openshiftvcn.oraclevcn.com kubenswrapper[6402]: I0531 00:05:45.708698    6402 log.go:194] http: TLS handshake error from 10.0.38.103:43802: EOF
      /path/to/must-gather/host_service_logs/masters/kubelet_service.log:May 31 00:05:45.709733 cp3.private2.openshiftvcn.oraclevcn.com kubenswrapper[6321]: I0531 00:05:45.709695    6321 log.go:194] http: TLS handshake error from 10.0.36.76:41146: EOF
      /path/to/must-gather/host_service_logs/masters/kubelet_service.log:May 31 00:05:50.707730 cp2.private2.openshiftvcn.oraclevcn.com kubenswrapper[6309]: I0531 00:05:50.707692    6309 log.go:194] http: TLS handshake error from 10.0.39.135:50026: EOF
      /path/to/must-gather/host_service_logs/masters/kubelet_service.log:May 31 00:05:50.708896 cp1.private2.openshiftvcn.oraclevcn.com kubenswrapper[6402]: I0531 00:05:50.708857    6402 log.go:194] http: TLS handshake error from 10.0.38.103:51622: EOF
      

       

      • Kubelet log IP summary 

      External/OCI/OPCT

       

      $ grep -r 'TLS handshake error from' ${MUST_GATHER} | awk -F'TLS handshake error from' '{print$2}' | awk -F':' '{print$1}' | sort | uniq -c | sort -n
            1 ' error",
           28  10.128.0.88
           34  10.128.0.2
          520  127.0.0.1
          815  10.128.2.2
         2273  10.0.38.103
         2275  10.0.39.135
         2282  10.0.36.76
         2409  10.130.0.2
         3381  10.129.2.2 

       

       

      External/AWS/OPCT

       

      $ grep -r 'TLS handshake error from' ${MUST_GATHER} | awk -F'TLS handshake error from' '{print$2}' | awk -F':' '{print$1}' | sort | uniq -c | sort -n
            1 ' error",
            2  10.0.10.182
            2  10.129.0.2
            2  10.131.0.12
           28  10.130.0.61
          187  127.0.0.1
         1783  10.0.48.66
         2350  10.0.72.53
         2351  10.0.59.76
         4203  10.131.0.2
         4321  10.128.2.2
      
      $ echo $MUST_GATHER
      /home/mtulio/opct/partners/oci/BM/202405302052/ci-4.15-opct-external-aws-ccm-202402132052_must-gather 

       

       

      For reference, the regular e2e reports almost 4 times less errors, following the same pattern:

      vSphere/vSphere/e2e:

       

            1 ' error",
            2  10.131.0.8
           28  10.128.0.80
           71  127.0.0.1
          631  10.177.158.105
          631  10.177.158.80
          632  10.177.158.25
         1216  10.128.2.2
         1404  10.131.0.2 

       

       

      External/AWS/e2e:

       

      $ grep -r 'TLS handshake error from' ${MUST_GATHER} | awk -F'TLS handshake error from' '{print$2}' | awk -F':' '{print$1}' | sort | uniq -c | sort -n
            1  10.0.8.75
            1 ' error",
            2  10.129.0.2
            2  10.131.0.8
           26  10.129.0.61
           99  127.0.0.1
          546  10.0.50.135
          546  10.0.51.110
          547  10.0.77.221
         3677  10.128.2.2
         4240  10.131.0.2 

       

       

      The problem was caught when analyzing OCI BM results on OPCT-287

              Unassigned Unassigned
              rhn-support-mrbraga Marco Braga
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: