-
Bug
-
Resolution: Can't Do
-
Normal
-
None
-
4.13.z
-
Critical
-
No
-
3
-
Metal Platform 237, Metal Platform 240, Metal Platform 241, Metal Platform 242
-
4
-
Rejected
-
False
-
Description of problem:
Dualstack BM IPI fails to deploy due to workers not having CSRs approved (In libvirt env).
Version-Release number of selected component (if applicable):
4.13.3 Have not seen this in 4.12 or below so far
How reproducible:
100% so far
Steps to Reproduce:
1. Deploy 4.14.13 BM IPI cluster with 3 masters and 2 workers (Using libvirt env)
Actual results:
2 workers never join the cluster as csrs are never approved
Expected results:
Workers join and cluster deploys properly
Additional info:
Possibly related to hostname vs fqdn mismatch with workers
Worker node journal is stuck looping with:
Jun 14 02:04:52 worker-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com kubenswrapper[4162]: W0614 02:04:52.700155 4162 reflector.go:424] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: services is forbidden: User "system:anonymous" cannot list resource "services" in API group "" at the cluster scope Jun 14 02:04:52 worker-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com kubenswrapper[4162]: E0614 02:04:52.700377 4162 reflector.go:140] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User "system:anonymous" cannot list resource "services" in API group "" at the cluster scope Jun 14 02:04:52 worker-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com kubenswrapper[4162]: E0614 02:04:52.763100 4162 transport.go:112] "No valid client certificate is found but the server is not responsive. A restart may be necessary to retrieve new initial credentials." lastCertificateAvailabilityTime="2023-06-14 00:53:52.401194451 +0000 UTC m=+0.167477205" shutdownThreshold="5m0s" Jun 14 02:04:52 worker-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com kubenswrapper[4162]: E0614 02:04:52.930991 4162 eviction_manager.go:261] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"worker-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com\" not found" Jun 14 02:04:53 worker-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com kubenswrapper[4162]: I0614 02:04:53.454859 4162 csi_plugin.go:913] Failed to contact API server when waiting for CSINode publishing: csinodes.storage.k8s.io "worker-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com" is forbidden: User "system:anonymous" cannot get resource "csinodes" in API group "storage.k8s.io" at the cluster scope Jun 14 02:04:54 worker-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com kubenswrapper[4162]: I0614 02:04:54.455396 4162 csi_plugin.go:913] Failed to contact API server when waiting for CSINode publishing: csinodes.storage.k8s.io "worker-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com" is forbidden: User "system:anonymous" cannot get resource "csinodes" in API group "storage.k8s.io" at the cluster scope Jun 14 02:04:55 worker-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com kubenswrapper[4162]: I0614 02:04:55.453824 4162
Here are the master nodes ready, but no workers:
NAME STATUS ROLES AGE VERSION master-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com Ready control-plane,master 111m v1.26.5+7a891f0 master-0-1.ocp-edge-cluster-assisted-0.qe.lab.redhat.com Ready control-plane,master 111m v1.26.5+7a891f0 master-0-2.ocp-edge-cluster-assisted-0.qe.lab.redhat.com Ready control-plane,master 111m v1.26.5+7a891f0
CSRs are pending:
oc get csr NAME AGE SIGNERNAME REQUESTOR REQUESTEDDURATION CONDITION csr-4rv5x 65m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper <none> Pending csr-5jv9d 80m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper <none> Pending csr-5zkh7 19m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper <none> Pending csr-88lh8 50m kubernetes.io/kube-apiserver-client-kubelet system:serviceaccount:openshift-machine-config-operator:node-bootstrapper <none> Pending
If I manually approve the CSRs the workers start showing up:
oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
oc get nodes
NAME STATUS ROLES AGE VERSION
master-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com Ready control-plane,master 111m v1.26.5+7a891f0
master-0-1.ocp-edge-cluster-assisted-0.qe.lab.redhat.com Ready control-plane,master 111m v1.26.5+7a891f0
master-0-2.ocp-edge-cluster-assisted-0.qe.lab.redhat.com Ready control-plane,master 111m v1.26.5+7a891f0
worker-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com NotReady worker 2s v1.26.5+7a891f0
worker-0-1.ocp-edge-cluster-assisted-0.qe.lab.redhat.com NotReady worker 2s v1.26.5+7a891f0