Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-27378

[vSphere-CSI-Driver-Operator] does not update the VSphereCSIDriverOperatorCRAvailable status timely

    XMLWordPrintable

Details

    • Important
    • No
    • Proposed
    • False
    • Hide

      None

      Show
      None

    Description

      This is a clone of issue OCPBUGS-24421. The following is the description of the original issue:

      Description of problem:

      [vSphere-CSI-Driver-Operator] does not update the VSphereCSIDriverOperatorCRAvailable status timely

      Version-Release number of selected component (if applicable):

      4.15.0-0.nightly-2023-12-04-162702

      How reproducible:

      Always    

      Steps to Reproduce:

      1. Set up a vSphere cluster with 4.15 nightly;
      2. Backup the secret/vmware-vsphere-cloud-credentials to "vmware-cc.yaml"
      3. Change the secret/vmware-vsphere-cloud-credentials password to an invalid value under ns/openshift-cluster-csi-drivers by oc edit;
      4. Wait for the cluster storage operator degrade and the driver controller pods CrashLoopBackOff, then recover the backup secret "vmware-cc.yaml" back by apply;
      5. Observer the driver controller pods back to Running and the cluster storage operator should be back to healthy.
           

      Actual results:

      In Step5 : The driver controller pods back to Running but the cluster storage operator stuck at Degrade: True status for almost 1 hour$ oc get po
      NAME                                                    READY   STATUS    RESTARTS        AGE
      vmware-vsphere-csi-driver-controller-664db7d497-b98vt   13/13   Running   0               16s
      vmware-vsphere-csi-driver-controller-664db7d497-rtj49   13/13   Running   0               23s
      vmware-vsphere-csi-driver-node-2krg6                    3/3     Running   1 (3h4m ago)    3h5m
      vmware-vsphere-csi-driver-node-2t928                    3/3     Running   2 (3h16m ago)   3h16m
      vmware-vsphere-csi-driver-node-45kb8                    3/3     Running   2 (3h16m ago)   3h16m
      vmware-vsphere-csi-driver-node-8vhg9                    3/3     Running   1 (3h16m ago)   3h16m
      vmware-vsphere-csi-driver-node-9fh9l                    3/3     Running   1 (3h4m ago)    3h5m
      vmware-vsphere-csi-driver-operator-5954476ddc-rkpqq     1/1     Running   2 (3h10m ago)   3h17m
      vmware-vsphere-csi-driver-webhook-7b6b5d99f6-rxdt8      1/1     Running   0               3h16m
      vmware-vsphere-csi-driver-webhook-7b6b5d99f6-skcbd      1/1     Running   0               3h16m
      $ oc get co/storage -w
      NAME      VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      storage   4.15.0-0.nightly-2023-12-04-162702   False       False         True       8m39s   VSphereCSIDriverOperatorCRAvailable: VMwareVSphereControllerAvailable: error logging into vcenter: ServerFaultCode: Cannot complete login due to an incorrect user name or password.
      storage   4.15.0-0.nightly-2023-12-04-162702   True        False         False      0s
      $  oc get co/storage
      NAME      VERSION                              AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
      storage   4.15.0-0.nightly-2023-12-04-162702   True        False         False      3m41s
       

      Expected results:

      In Step5 : After driver controller pods back to Running the cluster storage operator should recover healthy status immediatelly  

      Additional info:

      I compare with the previous CI results seems this issue happened after 4.15.0-0.nightly-2023-11-25-110147    

      Attachments

        Issue Links

          Activity

            People

              hekumar@redhat.com Hemant Kumar
              openshift-crt-jira-prow OpenShift Prow Bot
              Penghao Wang Penghao Wang
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: