Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-51079

vSphere CSI Driver Operator Pod Fails to Restart After Serving-Cert Rotation

XMLWordPrintable

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

         The vmware-vsphere-csi-driver-operator pod encountered a graceful shutdown when it detected that the file /var/run/secrets/serving-cert/tls.crt was modified due to automatic certificate rotation. However, the expected restart of the container did not occur, leading to incorrect status reporting by the Storage Cluster Operator (CO).

      Version-Release number of selected component (if applicable):

          4.14.28

      How reproducible:

          NA

       

      1.The serving certificate in the vmware-vsphere-csi-driver-operator-metrics-serving-cert secret was renewed.
      
      2. The signing-key secret under openshift-service-ca was also rotated.
      
      3. The pod logs confirm that a restart was triggered due to the certificate change
      ~~~
      Restart triggered because of file /var/run/secrets/serving-cert/tls.crt was modified 
      ~~~
      
      4. However, instead of a restart, the vmware-vsphere-csi-driver-operator pod only shut down gracefully and did not come back up.As a result, the Storage CO reported an incorrect status, potentially leading to operational issues.

      Actual results:

          The vmware-vsphere-csi-driver-operator pod terminated and not restarted.

      Expected results:

          When an automatic serving-cert rotation occurs, all managed components, including the vmware-vsphere-csi-driver-operator, should restart automatically to apply the new certificate.
      
      The pod should not remain in a stopped state after detecting certificate modifications.

      Additional info:

      Pod did not restart:
        ~~~
       $ oc get pods -n openshift-cluster-csi-drivers vmware-vsphere-csi-driver-operator-64999cc575-9fnb4
      NAME                                                  READY   STATUS    RESTARTS   AGE
      vmware-vsphere-csi-driver-operator-64999cc575-9fnb4   1/1     Running   0          198d
      ~~~
      
       vmware-vsphere-csi-driver-operator pod logs:
      ~~~
      2025-02-16T13:37:00.111433139Z I0216 13:37:00.111384       1 observer_polling.go:120] Observed file "/var/run/secrets/serving-cert/tls.crt" has been modified (old="441f587ea4e02b0f173fa5762c6eae5152da674135c0b504adfcd4addc3f8a06", new="197897f6b51a7144aaa443cdfbedc8c6934bb8394fad62962326974c153dcb4c")
      2025-02-16T13:37:00.119228289Z W0216 13:37:00.119188       1 builder.go:132] Restart triggered because of file /var/run/secrets/serving-cert/tls.crt was modified
      2025-02-16T13:37:00.119385678Z I0216 13:37:00.119373       1 observer_polling.go:120] Observed file "/var/run/secrets/serving-cert/tls.key" has been modified (old="4f62c8047a318347848b40e57c778567dbb637103b332d1c6aa3e2d016a05603", new="8afdbe7e4abb741a64461e7fac2a39876ae64b601da7fad2e5c351038187fce0")
      2025-02-16T13:37:00.127687709Z I0216 13:37:00.119759       1 genericapiserver.go:680] "[graceful-termination] pre-shutdown hooks completed" name="PreShutdownHooksStopped"
      2025-02-16T13:37:00.127734116Z I0216 13:37:00.119786       1 genericapiserver.go:537] "[graceful-termination] shutdown event" name="ShutdownInitiated"
      2025-02-16T13:37:00.127768065Z I0216 13:37:00.127758       1 genericapiserver.go:540] "[graceful-termination] shutdown event" name="AfterShutdownDelayDuration"
      ~~~
      
      
      secret vmware-vsphere-csi-driver-operator-metrics-serving-cert tls.crt data:
      ~~~
      Certificate:
          Data:
              Version: 3 (0x2)
              Serial Number: 338774569176314150 (0x4b391acfeb0b126)
              Signature Algorithm: sha256WithRSAEncryption
              Issuer: CN=openshift-service-serving-signer@1671456907
              Validity
                  Not Before: Feb 16 13:36:39 2025 GMT
                  Not After : Feb 16 13:36:40 2027 GMT
              Subject: CN=vmware-vsphere-csi-driver-operator-metrics.openshift-cluster-csi-drivers.svc
      ~~~
      

      Uploading must-gather report too.

       

              Unassigned Unassigned
              rhn-support-dpateriy Divyam Pateriya
              None
              None
              Wei Duan Wei Duan
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: