Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-43381

Possible regression with storage deleting vmware-vsphere-csi-driver-node-xxxx pods multiple times

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • None
    • 4.18
    • Storage
    • Important
    • Yes
    • Proposed
    • False
    • Hide

      None

      Show
      None

      Component Readiness has found a potential regression in the following test:

      [sig-arch] events should not repeat pathologically for ns/openshift-cluster-csi-drivers

      Probability of significant regression: 99.45%

      Sample (being evaluated) Release: 4.18
      Start Time: 2024-10-08T00:00:00Z
      End Time: 2024-10-15T23:59:59Z
      Success Rate: 85.71%
      Successes: 18
      Failures: 3
      Flakes: 0

      Base (historical) Release: 4.17
      Start Time: 2024-09-01T00:00:00Z
      End Time: 2024-10-01T23:59:59Z
      Success Rate: 100.00%
      Successes: 93
      Failures: 0
      Flakes: 0

      View the test details report at https://sippy.dptools.openshift.org/sippy-ng/component_readiness/test_details?Aggregation=none&Architecture=amd64&Architecture=amd64&FeatureSet=default&FeatureSet=default&Installer=upi&Installer=upi&Network=ovn&Network=ovn&NetworkAccess=default&Platform=vsphere&Platform=vsphere&Scheduler=default&SecurityMode=default&Suite=serial&Suite=serial&Topology=ha&Topology=ha&Upgrade=none&Upgrade=none&baseEndTime=2024-10-01%2023%3A59%3A59&baseRelease=4.17&baseStartTime=2024-09-01%2000%3A00%3A00&capability=Other&columnGroupBy=Architecture%2CNetwork%2CPlatform&component=Storage&confidence=95&dbGroupBy=Platform%2CArchitecture%2CNetwork%2CTopology%2CFeatureSet%2CUpgrade%2CSuite%2CInstaller&environment=amd64%20default%20upi%20ovn%20vsphere%20serial%20ha%20none&ignoreDisruption=true&ignoreMissing=false&includeVariant=Architecture%3Aamd64&includeVariant=CGroupMode%3Av2&includeVariant=ContainerRuntime%3Arunc&includeVariant=FeatureSet%3Adefault&includeVariant=Installer%3Aipi&includeVariant=Installer%3Aupi&includeVariant=Owner%3Aeng&includeVariant=Platform%3Aaws&includeVariant=Platform%3Aazure&includeVariant=Platform%3Agcp&includeVariant=Platform%3Ametal&includeVariant=Platform%3Avsphere&includeVariant=Topology%3Aha&minFail=3&pity=5&sampleEndTime=2024-10-15%2023%3A59%3A59&sampleRelease=4.18&sampleStartTime=2024-10-08%2000%3A00%3A00&testId=openshift-tests%3A30ac23dcd7037e581ed41a85fad97ecf&testName=%5Bsig-arch%5D%20events%20should%20not%20repeat%20pathologically%20for%20ns%2Fopenshift-cluster-csi-drivers

       

      In all three failures, the event is about deletion of vmware-vsphere-csi-driver-node-xxxx pods. Here is an example job

       

      Error message from the test:

      {  1 events happened too frequently
      
      event happened 22 times, something is wrong: namespace/openshift-cluster-csi-drivers daemonset/vmware-vsphere-csi-driver-node hmsg/6fee4a4ed8 - reason/SuccessfulDelete (combined from similar events): Deleted pod: vmware-vsphere-csi-driver-node-prkn5 (04:29:41Z) result=reject } 

       

      Event filter shows that this affects pod vmware-vsphere-csi-driver-node-prkn5 22 times:

      04:29:41 (x22)openshift-cluster-csi-driversdaemonset-controllervmware-vsphere-csi-driver-nodeSuccessfulDelete(combined from similar events): Deleted pod: vmware-vsphere-csi-driver-node-prkn5 

       

      It is worth mentioning though that the audit log does not seem to indicate the deletion is called that many times for vmware-vsphere-csi-driver-node-prkn5.  

      curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-master-nightly-4.18-e2e-vsphere-ovn-upi-serial/1845647277222268928/artifacts/e2e-vsphere-ovn-upi-serial/gather-extra/artifacts/audit_logs/kube-apiserver/ci-op-jkp8srsx-af8f5-4m5vh-control-plane-2-audit.log |grep vmware-vsphere-csi-driver-node-prkn5 |grep delete
      
      
      {"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"37c82d40-c6d3-4cf5-a133-ad354d92b940","stage":"ResponseComplete","requestURI":"/api/v1/namespaces/openshift-cluster-csi-drivers/pods/vmware-vsphere-csi-driver-node-prkn5","verb":"delete","user":{"username":"system:serviceaccount:kube-system:daemon-set-controller","uid":"b0c845d3-64b9-48ec-b9cc-bd918443a0d0","groups":["system:serviceaccounts","system:serviceaccounts:kube-system","system:authenticated"],"extra":{"authentication.kubernetes.io/credential-id":["JTI=725a6dc6-e457-4049-b872-adc94c8c80c3"]}},"sourceIPs":["10.94.182.2"],"userAgent":"kube-controller-manager/v1.31.1 (linux/amd64) kubernetes/4190e72/system:serviceaccount:kube-system:daemon-set-controller","objectRef":{"resource":"pods","namespace":"openshift-cluster-csi-drivers","name":"vmware-vsphere-csi-driver-node-prkn5","apiVersion":"v1"},"responseStatus":{"metadata":{},"code":200},"requestReceivedTimestamp":"2024-10-14T04:29:40.999621Z","stageTimestamp":"2024-10-14T04:29:41.023664Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":"RBAC: allowed by ClusterRoleBinding \"system:controller:daemon-set-controller\" of ClusterRole \"system:controller:daemon-set-controller\" to ServiceAccount \"daemon-set-controller/kube-system\""}}
      {"kind":"Event","apiVersion":"audit.k8s.io/v1","level":"Metadata","auditID":"729e2d49-50a0-4b77-a4f2-2dec6d53613e","stage":"ResponseComplete","requestURI":"/api/v1/namespaces/openshift-cluster-csi-drivers/pods/vmware-vsphere-csi-driver-node-prkn5","verb":"delete","user":{"username":"system:node:ci-op-jkp8srsx-af8f5-4m5vh-compute-1","groups":["system:nodes","system:authenticated"]},"sourceIPs":["10.94.182.2"],"userAgent":"kubelet/v1.31.1 (linux/amd64) kubernetes/4190e72","objectRef":{"resource":"pods","namespace":"openshift-cluster-csi-drivers","name":"vmware-vsphere-csi-driver-node-prkn5","apiVersion":"v1"},"responseStatus":{"metadata":{},"code":200},"requestReceivedTimestamp":"2024-10-14T04:29:41.782866Z","stageTimestamp":"2024-10-14T04:29:41.796583Z","annotations":{"authorization.k8s.io/decision":"allow","authorization.k8s.io/reason":""}} 

       

      So feel free to reassign if you deem this belonging to another component. 

              rbednar@redhat.com Roman Bednar
              kenzhang@redhat.com Ken Zhang
              Wei Duan Wei Duan
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: