-
Bug
-
Resolution: Cannot Reproduce
-
Undefined
-
None
-
4.14
-
No
-
False
-
-
Description of problem:
While installing various cluster sizes at scale with ACM/ZTP via the assisted-service, one cluster failed to complete install although it had passed the bootstrap and ignition applying phase of install. It seems the cluster now in error state may have failed because of these log lines that are repeated on the nodes: Oct 04 15:31:05 vm02011 bash[7849]: E1004 15:31:05.728774 7849 transport.go:123] "No valid client certificate is found but the server is not responsive. A restart may be necessary to retrieve new initial credentials." lastCertificateAvailabilityTime="2023-10-03 01:53:31.728025434 +0000 UTC m=+0.135872703" shutdownThreshold="5m0s" Oct 04 15:31:06 vm02011 bash[7849]: E1004 15:31:06.728989 7849 transport.go:123] "No valid client certificate is found but the server is not responsive. A restart may be necessary to retrieve new initial credentials." lastCertificateAvailabilityTime="2023-10-03 01:53:31.728025434 +0000 UTC m=+0.135872703" shutdownThreshold="5m0s" Oct 04 15:31:06 vm02011 bash[7849]: I1004 15:31:06.880257 7849 certificate_manager.go:356] kubernetes.io/kube-apiserver-client-kubelet: Rotating certificates Oct 04 15:31:06 vm02011 bash[7849]: E1004 15:31:06.882829 7849 certificate_manager.go:562] kubernetes.io/kube-apiserver-client-kubelet: Failed while requesting a signed certificate from the control plane: cannot create certificate signing request: Post "https://api-int.compact-00268.rdu2.scalelab.redhat.com:6443/apis/certificates.k8s.io/v1/certificatesigningrequests": EOF
Version-Release number of selected component (if applicable):
Hub OCP - 4.14.0-rc.2 Deployed clusters 4.14.0-rc.2 ACM - 2.9.0-DOWNSTREAM-2023-09-27-22-12-46
How reproducible:
1 out of 34 failures, however this failure seems unique compared to the other failed cluster installs.
Steps to Reproduce:
1. 2. 3.
Actual results:
Expected results:
Additional info: