Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-20091

Compact cluster failed install during ACM/ZTP scale test - No valid client certificate is found but the server is not responsive. A restart may be necessary to retrieve new initial credentials.

XMLWordPrintable

    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      While installing various cluster sizes at scale with ACM/ZTP via the assisted-service, one cluster failed to complete install although it had passed the bootstrap and ignition applying phase of install.
      
      It seems the cluster now in error state may have failed because of these log lines that are repeated on the nodes:
      
      Oct 04 15:31:05 vm02011 bash[7849]: E1004 15:31:05.728774    7849 transport.go:123] "No valid client certificate is found but the server is not responsive. A restart may be necessary to retrieve new initial credentials." lastCertificateAvailabilityTime="2023-10-03 01:53:31.728025434 +0000 UTC m=+0.135872703" shutdownThreshold="5m0s"
      Oct 04 15:31:06 vm02011 bash[7849]: E1004 15:31:06.728989    7849 transport.go:123] "No valid client certificate is found but the server is not responsive. A restart may be necessary to retrieve new initial credentials." lastCertificateAvailabilityTime="2023-10-03 01:53:31.728025434 +0000 UTC m=+0.135872703" shutdownThreshold="5m0s"
      Oct 04 15:31:06 vm02011 bash[7849]: I1004 15:31:06.880257    7849 certificate_manager.go:356] kubernetes.io/kube-apiserver-client-kubelet: Rotating certificates
      Oct 04 15:31:06 vm02011 bash[7849]: E1004 15:31:06.882829    7849 certificate_manager.go:562] kubernetes.io/kube-apiserver-client-kubelet: Failed while requesting a signed certificate from the control plane: cannot create certificate signing request: Post "https://api-int.compact-00268.rdu2.scalelab.redhat.com:6443/apis/certificates.k8s.io/v1/certificatesigningrequests": EOF
      
      
      

      Version-Release number of selected component (if applicable):

      Hub OCP - 4.14.0-rc.2
      Deployed clusters 4.14.0-rc.2
      ACM - 2.9.0-DOWNSTREAM-2023-09-27-22-12-46

      How reproducible:

      1 out of 34 failures, however this failure seems unique compared to the other failed cluster installs.

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:

       

      Expected results:

       

      Additional info:

       

        1. vm02010.journal.log.gz
          47.85 MB
        2. vm02009.journal.log.gz
          12.06 MB
        3. vm02011.journal.log.gz
          6.67 MB
        4. compact-00268-aci-logs.tar
          327 kB
        5. compact-00268-aci-events.json
          55 kB

            itsoiref@redhat.com Igal Tsoiref
            akrzos@redhat.com Alex Krzos
            Lital Alon Lital Alon
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: