Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-33644

kubelet-serving CSRs in Pending state on SNO with Telco DU with disabled capabilities

XMLWordPrintable

    • Critical
    • No
    • 4
    • OCPEDGE Sprint 253, OCPEDGE Sprint 254
    • 2
    • Proposed
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      After running tests on an SNO with Telco DU profile for a couple of hours kubernetes.io/kubelet-serving CSRs in Pending state start showing up and accumulating in time. 

      Version-Release number of selected component (if applicable):

      4.16.0-rc.1    

      How reproducible:

      once so far    

      Steps to Reproduce:

          1. Deploy SNO with DU profile with disabled capabilities:
      
          installConfigOverrides:  "{\"capabilities\":{\"baselineCapabilitySet\": \"None\", \"additionalEnabledCapabilities\": [ \"NodeTuning\", \"ImageRegistry\", \"OperatorLifecycleManager\" ] }}"
      
      2. Leave the node running tests overnight for a couple of hours
      
      3. Check for Pending CSRs
      

      Actual results:

      oc get csr -A | grep Pending | wc -l 
      27    

      Expected results:

      No pending CSRs    
      
      Also oc logs will return a tls internal error:
      
      oc -n openshift-cluster-machine-approver --insecure-skip-tls-verify-backend=true logs machine-approver-866c94c694-7dwks 
      Defaulted container "kube-rbac-proxy" out of: kube-rbac-proxy, machine-approver-controller
      Error from server: Get "https://[2620:52:0:8e6::d0]:10250/containerLogs/openshift-cluster-machine-approver/machine-approver-866c94c694-7dwks/kube-rbac-proxy": remote error: tls: internal error
      

      Additional info:

      Checking the machine-approver-controller container logs on the node we can see the reconciliation is failing be cause it cannot find the Machine API which is disabled from the capabilities.
      
      I0514 13:25:09.266546       1 controller.go:120] Reconciling CSR: csr-dw9c8
      E0514 13:25:09.275585       1 controller.go:138] csr-dw9c8: Failed to list machines in API group machine.openshift.io/v1beta1: no matches for kind "Machine" in version "machine.openshift.io/v1beta1"
      E0514 13:25:09.275665       1 controller.go:329] "Reconciler error" err="Failed to list machines: no matches for kind \"Machine\" in version \"machine.openshift.io/v1beta1\"" controller="certificatesigningrequest" controllerGroup="certificates.k8s.io" controllerKind="CertificateSigningRequest" CertificateSigningRequest="csr-dw9c8" namespace="" name="csr-dw9c8" reconcileID="6f963337-c6f1-46e7-80c4-90494d21653c"
      I0514 13:25:43.792140       1 controller.go:120] Reconciling CSR: csr-jvrvt
      E0514 13:25:43.798079       1 controller.go:138] csr-jvrvt: Failed to list machines in API group machine.openshift.io/v1beta1: no matches for kind "Machine" in version "machine.openshift.io/v1beta1"
      E0514 13:25:43.798128       1 controller.go:329] "Reconciler error" err="Failed to list machines: no matches for kind \"Machine\" in version \"machine.openshift.io/v1beta1\"" controller="certificatesigningrequest" controllerGroup="certificates.k8s.io" controllerKind="CertificateSigningRequest" CertificateSigningRequest="csr-jvrvt" namespace="" name="csr-jvrvt" reconcileID="decbc5d9-fa10-45d1-92f1-1c999df956ff" 

            bzamalut@redhat.com Bulat Zamalutdinov
            mcornea@redhat.com Marius Cornea
            Milind Yadav Milind Yadav
            Votes:
            0 Vote for this issue
            Watchers:
            13 Start watching this issue

              Created:
              Updated: