Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-42001

OVN doesn't refresh certificates after node was suspended for 30 days on AWS

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.14.z, 4.15.0, 4.16.0
    • None
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      ovnkube-controller can't create CSR when node is suspended for 30 days in AWS provider
      
      This is similar to https://issues.redhat.com/browse/OCPBUGS-28735, but only applies to the cloud setups (AWS) - UPI deployment on baremetal was not affected

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          

      Steps to Reproduce:

          1. Setup an AWS cluster on machine
          2. Setup a bastion host
          3. Disable chronyd on all nodes
          4. Suspend nodes
          5. Change time on host 30 days forward
          6. Resume nodes
          7. Wait for API server to come up
          8. Wait for all operators to become ready
          

      Actual results:

      ovnkube-controller can't create CSRs:
      
      igning request: Unauthorized\nI1015 00:56:11.127945   19420 certificate_manager.go:356] kubernetes.io/kube-apiserver-client: Rotating certificates\nE1015 00:56:11.135570   19420 certificate_manager.go:562] kubernetes.io/kube-apiserver-client: Failed while requesting a signed certificate from the control plane: cannot create certificate signing request: Unauthorized\nI1015 00:56:28.301468   19420 certificate_manager.go:356] kubernetes.io/kube-apiserver-client: Rotating certificates\nE1015 00:56:28.307212   19420 certificate_manager.go:562] kubernetes.io/kube-apiserver-client: Failed while requesting a signed certificate from the control plane: cannot create certificate signing request: Unauthorized\nE1015 00:56:28.307234   19420 certificate_manager.go:440] kubernetes.io/kube-apiserver-client: Reached backoff limit, still unable to rotate certs: timed out waiting for the condition\nI1015 00:57:00.309234   19420 certificate_manager.go:356] kubernetes.io/kube-apiserver-client: Rotating certificates\nE1015 00:57:00.315334   19420 certificate_manager.go:562] kubernetes.io/kube-apiserver-client: Failed while requesting a signed certificate from the control plane: cannot create certificate signing request: Unauthorized\nI1015 00:57:32.308867   19420 certificate_manager.go:356] kubernetes.io/kube-apiserver-client: Rotating certificates\nI1015 00:57:32.314494    

      Expected results:

      CSRs are created and new certs are being issued.
      
      Workaround:
      * oc --request-timeout=5s -n openshift-multus delete pod -l app=multus --force --grace-period=0 

      Additional info:

      CI job: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-ha-cert-rotation-suspend-30d/1835106989529632768

              pdiak@redhat.com Patryk Diak
              vrutkovs@redhat.com Vadim Rutkovsky
              Ke Wang Ke Wang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: