Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-56192

4.18: CSI certification fails for RWX volumes

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Normal Normal
    • 4.18.0
    • 4.18
    • Storage
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Moderate
    • None
    • None
    • None
    • Done
    • Release Note Not Required
    • N/A
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-54318. The following is the description of the original issue:

      CSI certification test failed for spectrumscale.csi.ibm.com failed test and I think it's either the test or OCP fault.

      Test:   External Storage [Driver: spectrumscale.csi.ibm.com] [Testpattern: Dynamic PV (filesystem volmode)] multiVolume [Slow] should concurrently access the single volume from pods on different node [Slow] 

      The test creates a single PVC and runs two Pods with that PVC, each on a different node.

      The first pod starts in few seconds:

      STEP: Creating pod1 with a volume on {Name: Selector:map[kubernetes.io/os:linux] Affinity:nil} @ 03/27/25 15:22:33.143
      ...
      At 2025-03-27 15:22:43 +0000 UTC - event for pod-013c3f38-ad71-4e28-92c4-88e53408936c: {kubelet worker-0-2} Started: Started container write-pod 

       

      However, the second pod takes > 5 minutes to start:

      STEP: Creating pod2 with a volume on [..snip...] @ 03/27/25 15:22:45.255 
      ...
      At 2025-03-27 15:28:16 +0000 UTC - event for pod-79ec8cdf-e57b-4c07-94a4-0ae1811b3dc5: {kubelet worker-0-1} Started: Started container write-pod
      
      

      The reason for such late start is that the volume could not be attached to the second node with

       At 2025-03-27 15:22:45 +0000 UTC - event for pod-79ec8cdf-e57b-4c07-94a4-0ae1811b3dc5: {attachdetach-controller } FailedAttachVolume: Multi-Attach error for volume "pvc-c0ad917f-5994-48e0-a923-8ffecdc07708" Volume is already used by pod(s) pod-013c3f38-ad71-4e28-92c4-88e53408936c 

      The first pod gets deleted at 15:27:45, which then allows attach to the second node to succeed + the second pod starts. But it's already too late.

      That message should be sent only on RWO volumes, but this volume should be RWX (that's the point of this test!). Either the test created a wrong PVC or we have a bug in A/D controller.

              rhn-engineering-jsafrane Jan Safranek
              openshift-crt-jira-prow OpenShift Prow Bot
              None
              None
              Wei Duan Wei Duan
              None
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: