Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-76338

MCDs are broken when the loadbalancer-serving-signer certificate expires

    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      
      When the certificate in loadbalancer-serving-signer expires, MCDs fail reporting this error
      
      
      E0206 13:19:21.198022    2776 daemon.go:1410] Got an error from auxiliary tools: failed to list *v1.Node: Get "https://api-int.sregidor-exp2pr.qe.devcluster.openshift.com:6443/api/v1/nodes?fieldSelector=metadata.name%3Dip-10-0-77-136.us-east-2.compute.internal&resourceVersion=47182": tls: failed to verify certificate: x509: certificate signed by unknown authority
      E0206 13:19:57.391123    2776 reflector.go:205] "Failed to watch" err="failed to list *v1.Node: Get \"https://api-int.sregidor-exp2pr.qe.devcluster.openshift.com:6443/api/v1/nodes?fieldSelector=metadata.name%3Dip-10-0-77-136.us-east-2.compute.internal&resourceVersion=47182\": tls: failed to verify certificate: x509: certificate signed by unknown authority" reflector="k8s.io/client-go/informers/factory.go:160" type="*v1.Node"
      E0206 13:19:57.391153    2776 daemon.go:1410] Got an error from auxiliary tools: failed to list *v1.Node: Get "https://api-int.sregidor-exp2pr.qe.devcluster.openshift.com:6443/api/v1/nodes?fieldSelector=metadata.name%3Dip-10-0-77-136.us-east-2.compute.internal&resourceVersion=47182": tls: failed to verify certificate: x509: certificate signed by unknown authority
      
      
          

      Version-Release number of selected component (if applicable):

      4.22
          

      How reproducible:

      Always
          

      Steps to Reproduce:

          1. Modify the installer and the cluster-kube-apiserver-operator so that loadbalancer-serving-signer expires after 2 hours
      
      We can do it using PRs like these ones to generate the image that we will test
      
      
      https://github.com/openshift/installer/pull/10291/changes
      https://github.com/openshift/cluster-kube-apiserver-operator/pull/2030/changes
      
      use clusterbot like this to generate the image
      
      build 4.22,openshift/installer#10291,openshift/machine-config-operator#5623
      
          2. Install a cluster using the image generated in step 1
      
          3. Wait 2 hours until the certificate expires
      
          4. Read the logs in MCD pods
      
          

      Actual results:

      
      We will find these errors in most MCD pods
      
      
      E0206 13:19:21.198022    2776 daemon.go:1410] Got an error from auxiliary tools: failed to list *v1.Node: Get "https://api-int.sregidor-exp2pr.qe.devcluster.openshift.com:6443/api/v1/nodes?fieldSelector=metadata.name%3Dip-10-0-77-136.us-east-2.compute.internal&resourceVersion=47182": tls: failed to verify certificate: x509: certificate signed by unknown authority
      E0206 13:19:57.391123    2776 reflector.go:205] "Failed to watch" err="failed to list *v1.Node: Get \"https://api-int.sregidor-exp2pr.qe.devcluster.openshift.com:6443/api/v1/nodes?fieldSelector=metadata.name%3Dip-10-0-77-136.us-east-2.compute.internal&resourceVersion=47182\": tls: failed to verify certificate: x509: certificate signed by unknown authority" reflector="k8s.io/client-go/informers/factory.go:160" type="*v1.Node"
      E0206 13:19:57.391153    2776 daemon.go:1410] Got an error from auxiliary tools: failed to list *v1.Node: Get "https://api-int.sregidor-exp2pr.qe.devcluster.openshift.com:6443/api/v1/nodes?fieldSelector=metadata.name%3Dip-10-0-77-136.us-east-2.compute.internal&resourceVersion=47182": tls: failed to verify certificate: x509: certificate signed by unknown authority
      
          

      Expected results:

      
      When the certificate is rotated, MCDs should include it in the kubeconfig file and restart kubelet and they should continue working fine.
      
          

      Additional info:

      
      This certificate expires in 10 years in normal conditions. That's why we need to hack the installer to test this behaviour.
      
          

              team-mco Team MCO
              sregidor@redhat.com Sergio Regidor de la Rosa
              Sergio Regidor de la Rosa Sergio Regidor de la Rosa
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: