Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-14933

4.13 Dualstack BM IPI cluster fails to deploy workers

XMLWordPrintable

    • Critical
    • No
    • 3
    • Metal Platform 237, Metal Platform 240, Metal Platform 241, Metal Platform 242
    • 4
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Dualstack BM IPI fails to deploy due to workers not having CSRs approved (In libvirt env).  

      Version-Release number of selected component (if applicable):

      4.13.3
      Have not seen this in 4.12 or below so far

      How reproducible:

      100% so far

      Steps to Reproduce:

      1. Deploy 4.14.13 BM IPI cluster with 3 masters and 2 workers (Using libvirt env)
      

      Actual results:

      2 workers never join the cluster as csrs are never approved 

      Expected results:

      Workers join and cluster deploys properly

      Additional info:

      Possibly related to hostname vs fqdn mismatch with workers

       

      Worker node journal is stuck looping with:

       

      Jun 14 02:04:52 worker-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com kubenswrapper[4162]: W0614 02:04:52.700155    4162 reflector.go:424] vendor/k8s.io/client-go/informers/factory.go:150: failed to list *v1.Service: services is forbidden: User "system:anonymous" cannot list resource "services" in API group "" at the cluster scope
      Jun 14 02:04:52 worker-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com kubenswrapper[4162]: E0614 02:04:52.700377    4162 reflector.go:140] vendor/k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Service: failed to list *v1.Service: services is forbidden: User "system:anonymous" cannot list resource "services" in API group "" at the cluster scope
      Jun 14 02:04:52 worker-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com kubenswrapper[4162]: E0614 02:04:52.763100    4162 transport.go:112] "No valid client certificate is found but the server is not responsive. A restart may be necessary to retrieve new initial credentials." lastCertificateAvailabilityTime="2023-06-14 00:53:52.401194451 +0000 UTC m=+0.167477205" shutdownThreshold="5m0s"
      Jun 14 02:04:52 worker-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com kubenswrapper[4162]: E0614 02:04:52.930991    4162 eviction_manager.go:261] "Eviction manager: failed to get summary stats" err="failed to get node info: node \"worker-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com\" not found"
      Jun 14 02:04:53 worker-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com kubenswrapper[4162]: I0614 02:04:53.454859    4162 csi_plugin.go:913] Failed to contact API server when waiting for CSINode publishing: csinodes.storage.k8s.io "worker-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com" is forbidden: User "system:anonymous" cannot get resource "csinodes" in API group "storage.k8s.io" at the cluster scope
      Jun 14 02:04:54 worker-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com kubenswrapper[4162]: I0614 02:04:54.455396    4162 csi_plugin.go:913] Failed to contact API server when waiting for CSINode publishing: csinodes.storage.k8s.io "worker-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com" is forbidden: User "system:anonymous" cannot get resource "csinodes" in API group "storage.k8s.io" at the cluster scope
      Jun 14 02:04:55 worker-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com kubenswrapper[4162]: I0614 02:04:55.453824    4162 

       

       

      Here are the master nodes ready, but no workers:

       

      NAME                                                       STATUS   ROLES                  AGE    VERSION
      master-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com   Ready    control-plane,master   111m   v1.26.5+7a891f0
      master-0-1.ocp-edge-cluster-assisted-0.qe.lab.redhat.com   Ready    control-plane,master   111m   v1.26.5+7a891f0
      master-0-2.ocp-edge-cluster-assisted-0.qe.lab.redhat.com   Ready    control-plane,master   111m   v1.26.5+7a891f0 

       

       

      CSRs are pending:

      oc get csr
      NAME        AGE     SIGNERNAME                                    REQUESTOR                                                                   REQUESTEDDURATION   CONDITION
      csr-4rv5x   65m     kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   <none>              Pending
      csr-5jv9d   80m     kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   <none>              Pending
      csr-5zkh7   19m     kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   <none>              Pending
      csr-88lh8   50m     kubernetes.io/kube-apiserver-client-kubelet   system:serviceaccount:openshift-machine-config-operator:node-bootstrapper   <none>              Pending 

      If I manually approve the CSRs the workers start showing up:

      oc get csr -o go-template='{{range .items}}{{if not .status}}{{.metadata.name}}{{"\n"}}{{end}}{{end}}' | xargs oc adm certificate approve
      
      oc get nodes
      NAME                                                       STATUS     ROLES                  AGE    VERSION
      master-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com   Ready      control-plane,master   111m   v1.26.5+7a891f0
      master-0-1.ocp-edge-cluster-assisted-0.qe.lab.redhat.com   Ready      control-plane,master   111m   v1.26.5+7a891f0
      master-0-2.ocp-edge-cluster-assisted-0.qe.lab.redhat.com   Ready      control-plane,master   111m   v1.26.5+7a891f0
      worker-0-0.ocp-edge-cluster-assisted-0.qe.lab.redhat.com   NotReady   worker                 2s     v1.26.5+7a891f0
      worker-0-1.ocp-edge-cluster-assisted-0.qe.lab.redhat.com   NotReady   worker                 2s     v1.26.5+7a891f0 

            rpittau@redhat.com Riccardo Pittau
            chadcrum Chad Crum
            Chad Crum Chad Crum
            Chad Crum
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: