Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-42001

OVN doesn't refresh certificates after node was suspended for 30 days on AWS

    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      ovnkube-controller can't create CSR when node is suspended for 30 days in AWS provider
      
      This is similar to https://issues.redhat.com/browse/OCPBUGS-28735, but only applies to the cloud setups (AWS) - UPI deployment on baremetal was not affected

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          

      Steps to Reproduce:

          1. Setup an AWS cluster on machine
          2. Setup a bastion host
          3. Disable chronyd on all nodes
          4. Suspend nodes
          5. Change time on host 30 days forward
          6. Resume nodes
          7. Wait for API server to come up
          8. Wait for all operators to become ready
          

      Actual results:

      ovnkube-controller can't create CSRs:
      
      igning request: Unauthorized\nI1015 00:56:11.127945   19420 certificate_manager.go:356] kubernetes.io/kube-apiserver-client: Rotating certificates\nE1015 00:56:11.135570   19420 certificate_manager.go:562] kubernetes.io/kube-apiserver-client: Failed while requesting a signed certificate from the control plane: cannot create certificate signing request: Unauthorized\nI1015 00:56:28.301468   19420 certificate_manager.go:356] kubernetes.io/kube-apiserver-client: Rotating certificates\nE1015 00:56:28.307212   19420 certificate_manager.go:562] kubernetes.io/kube-apiserver-client: Failed while requesting a signed certificate from the control plane: cannot create certificate signing request: Unauthorized\nE1015 00:56:28.307234   19420 certificate_manager.go:440] kubernetes.io/kube-apiserver-client: Reached backoff limit, still unable to rotate certs: timed out waiting for the condition\nI1015 00:57:00.309234   19420 certificate_manager.go:356] kubernetes.io/kube-apiserver-client: Rotating certificates\nE1015 00:57:00.315334   19420 certificate_manager.go:562] kubernetes.io/kube-apiserver-client: Failed while requesting a signed certificate from the control plane: cannot create certificate signing request: Unauthorized\nI1015 00:57:32.308867   19420 certificate_manager.go:356] kubernetes.io/kube-apiserver-client: Rotating certificates\nI1015 00:57:32.314494    

      Expected results:

      CSRs are created and new certs are being issued.
      
      Workaround:
      * oc --request-timeout=5s -n openshift-multus delete pod -l app=multus --force --grace-period=0 

      Additional info:

      CI job: https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.18-e2e-aws-ovn-ha-cert-rotation-suspend-30d/1835106989529632768

            [OCPBUGS-42001] OVN doesn't refresh certificates after node was suspended for 30 days on AWS

            This also affects SNO too, see logs at https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ovn-sno-cert-rotation-suspend-360d/1837643748859711488:

                    

            InstallerPodNetworkingDegraded: Pod "installer-13-test-infra-cluster-f8d8479b-master-0" on node "test-infra-cluster-f8d8479b-master-0" observed degraded networking: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_installer-13-test-infra-cluster-f8d8479b-master-0_openshift-kube-apiserver_f5d03f7c-7ced-45d2-b220-6d96e98c1947_0(c22aa24b1a956cc9a936c7099c6b1db14a04656d6a4ce37c33b69fb0c095590d): error adding pod openshift-kube-apiserver_installer-13-test-infra-cluster-f8d8479b-master-0 to CNI network "multus-cni-network": plugin type="multus-shim" name="multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: 'ContainerID:"c22aa24b1a956cc9a936c7099c6b1db14a04656d6a4ce37c33b69fb0c095590d" Netns:"/var/run/netns/843d4aa4-3217-43eb-94ac-6d38d687c341" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=openshift-kube-apiserver;K8S_POD_NAME=installer-13-test-infra-cluster-f8d8479b-master-0;K8S_POD_INFRA_CONTAINER_ID=c22aa24b1a956cc9a936c7099c6b1db14a04656d6a4ce37c33b69fb0c095590d;K8S_POD_UID=f5d03f7c-7ced-45d2-b220-6d96e98c1947" Path:"" ERRORED: error configuring pod [openshift-kube-apiserver/installer-13-test-infra-cluster-f8d8479b-master-0] networking: Multus: [openshift-kube-apiserver/installer-13-test-infra-cluster-f8d8479b-master-0/f5d03f7c-7ced-45d2-b220-6d96e98c1947]: error waiting for pod: Unauthorized

            Vadim Rutkovsky added a comment - This also affects SNO too, see logs at https://prow.ci.openshift.org/view/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.18-e2e-metal-ovn-sno-cert-rotation-suspend-360d/1837643748859711488:          InstallerPodNetworkingDegraded: Pod "installer-13-test-infra-cluster-f8d8479b-master-0" on node "test-infra-cluster-f8d8479b-master-0" observed degraded networking: Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_installer-13-test-infra-cluster-f8d8479b-master-0_openshift-kube-apiserver_f5d03f7c-7ced-45d2-b220-6d96e98c1947_0(c22aa24b1a956cc9a936c7099c6b1db14a04656d6a4ce37c33b69fb0c095590d): error adding pod openshift-kube-apiserver_installer-13-test-infra-cluster-f8d8479b-master-0 to CNI network "multus-cni-network" : plugin type= "multus-shim" name= "multus-cni-network" failed (add): CmdAdd (shim): CNI request failed with status 400: 'ContainerID: "c22aa24b1a956cc9a936c7099c6b1db14a04656d6a4ce37c33b69fb0c095590d" Netns: "/ var /run/netns/843d4aa4-3217-43eb-94ac-6d38d687c341" IfName: "eth0" Args: "IgnoreUnknown=1;K8S_POD_NAMESPACE=openshift-kube-apiserver;K8S_POD_NAME=installer-13-test-infra-cluster-f8d8479b-master-0;K8S_POD_INFRA_CONTAINER_ID=c22aa24b1a956cc9a936c7099c6b1db14a04656d6a4ce37c33b69fb0c095590d;K8S_POD_UID=f5d03f7c-7ced-45d2-b220-6d96e98c1947" Path:"" ERRORED: error configuring pod [openshift-kube-apiserver/installer-13-test-infra-cluster-f8d8479b-master-0] networking: Multus: [openshift-kube-apiserver/installer-13-test-infra-cluster-f8d8479b-master-0/f5d03f7c-7ced-45d2-b220-6d96e98c1947]: error waiting for pod: Unauthorized

              pdiak@redhat.com Patryk Diak
              vrutkovs@redhat.com Vadim Rutkovsky
              Ke Wang Ke Wang
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: