Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-28735

Multus doesn't refresh certificates after node was suspended for 30 days

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Undefined Undefined
    • None
    • 4.14.z, 4.15.0, 4.16.0
    • Networking / multus
    • None
    • No
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Multus doesn't issue a CSR to get new certificates when node is suspended for 30 days   

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          

      Steps to Reproduce:

          1. Setup a libvirt cluster on machine
          2. Disable chronyd on all nodes and host machine
          3. Suspend nodes
          4. Change time on host 30 days forward
          5. Resume nodes
          6. Wait for API server to come up
          7. Wait for all operators to become ready
          

      Actual results:

      Multus would attempt to use expired certs:
      
      2024-01-21T01:24:15.456299440+00:00 stderr F 2024-01-21T01:24:15Z [verbose] DEL finished CNI request ContainerID:"f01434ff66b5571923e23aa1696bca1bc4b63b5e89d9b84bb4965c8d599a9dc9" Netns:"/var/run/netns/313a63fa-7765-4f9e-b330-643c8c3e08d2" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=openshift-machine-config-operator;K8S_POD_NAME=kubelet-bootstrap-cred-manager-msgls;K8S_POD_INFRA_CONTAINER_ID=f01434ff66b557192
      3e23aa1696bca1bc4b63b5e89d9b84bb4965c8d599a9dc9;K8S_POD_UID=3133b172-dd21-4d05-9662-22c0841c9821" Path:"", result: "", err: <nil>
      2024-04-20T01:25:33.997542623+00:00 stderr F E0420 01:25:33.995883    7683 reflector.go:148] k8s.io/client-go/informers/factory.go:150: Failed to watch *v1.Pod: the server has asked for the client to provide credentials (get pods)
      

      Expected results:

      Multus detects that cert is expired, requests new certs via CSR flow and reloads them
      

      Additional info:

      CI periodic to check this flow: https://prow.ci.openshift.org/job-history/gs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.16-e2e-metal-ovn-sno-cert-rotation-suspend-30d
      artifacts contain sosreport
      
      Applies to SNO and HA clusters, works as expected when nodes are being properly shutdown instead of suspended

            [OCPBUGS-28735] Multus doesn't refresh certificates after node was suspended for 30 days

            Ke Wang added a comment - - edited

            Hi Ke Wang , from the sympton, it seems to be OCPBUGS-27429. OCPBUGS-27429 is fixed by reloading multus internal certs if the certs are invalid.

            Hi tohayash@redhat.comBug verification for https://issues.redhat.com/browse/OCPBUGS-27429, unable to get a scenario with invalid certs, only can confirm that the cert is used by Multus is updated. If there is no sympton in this bug can be seen, that means the Multus pod uses refresh certificates, otherwise, will run into https://issues.redhat.com/browse/OCPBUGS-27429.
            I have raised one PR https://github.com/openshift/release/pull/49482 to verify it with 4.16, since it is duplicate of another bug ,will update the results in there.

            Ke Wang added a comment - - edited Hi Ke Wang , from the sympton, it seems to be OCPBUGS-27429 . OCPBUGS-27429 is fixed by reloading multus internal certs if the certs are invalid. Hi tohayash@redhat.com Bug verification for https://issues.redhat.com/browse/OCPBUGS-27429 , unable to get a scenario with invalid certs, only can confirm that the cert is used by Multus is updated. If there is no sympton in this bug can be seen, that means the Multus pod uses refresh certificates, otherwise, will run into https://issues.redhat.com/browse/OCPBUGS-27429 . I have raised one PR https://github.com/openshift/release/pull/49482 to verify it with 4.16, since it is duplicate of another bug ,will update the results in there.

            Ke Wang added a comment - - edited

            OCPBUGS-27429 is kube-apiserver's internal-loadbalancer signer update, this bug is multus internal certificate to communicate with API.

            I confirmed with Vadim.

            Ke Wang added a comment - - edited OCPBUGS-27429 is kube-apiserver's internal-loadbalancer signer update, this bug is multus internal certificate to communicate with API. I confirmed with Vadim.

            Is this card a dup of OCPBUGS-27429?

            W. Trevor King added a comment - Is this card a dup of OCPBUGS-27429 ?

              tohayash@redhat.com Tomofumi Hayashi
              vrutkovs@redhat.com Vadim Rutkovsky
              Ke Wang Ke Wang
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: