Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-3840

Volume attachment is failing for PV provisioned with thin storageclass

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • None
    • 4.11.z
    • Storage
    • None
    • Critical
    • Rejected
    • False
    • Hide

      None

      Show
      None

      Description of problem:

      Pod is stuck in ContainerCreating state because PV(provisioned with thin storageclass) volume attachment is not happening. 
      
      NAME                               READY   STATUS              RESTARTS   AGE   IP            NODE                                   NOMINATED NODE   READINESS GATES
      qs-keycloak-fei-5f548c7c67-l2ftc   0/1     ContainerCreating   0          2m    10.131.3.bb   workernode4.abcd.nonprod.hclpnp.com   <none>           <none>
      
      ~~~
      
      Events:
        Type     Reason              Age                From                     Message
        ----     ------              ----               ----                     -------
        Normal   Scheduled           31m                default-scheduler        Successfully assigned incubator/qs-keycloak-fei-5f548c7c67-l2ftc to workernode4.abcd.nonprod.hclpnp.com by ocpdsmaster3.abcd.nonprod.hclpnp.com
        Warning  FailedAttachVolume  31m                attachdetach-controller  Multi-Attach error for volume "pvc-51f92d53-5ae8-4a4b-9074-df07803e46ea" Volume is already used by pod(s) qs-keycloak-fei-5f548c7c67-nrxvb
        Warning  FailedMount         11m (x2 over 20m)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[qs-keycloak-fei], unattached volumes=[kube-api-access-v9zc2 qs-keycloak-fei]: timed out waiting for the condition
        Warning  FailedMount         4s (x12 over 29m)  kubelet                  Unable to attach or mount volumes: unmounted volumes=[qs-keycloak-fei], unattached volumes=[qs-keycloak-fei kube-api-access-v9zc2]: timed out waiting for the condition
      ~~~
      
      As per the multi attach error, we verified that the old pod qs-keycloak-fei-5f548c7c67-nrxvb was deleted gracefully. Backend volume detachment and unmounting happened successfully. 
        In worker02 node's yaml we did not find PV in VolumesAttached and VolumeInUse section. 
        On worker04, we observed that PV was populated into VolumesInUse section.
      ~~~
        volumesInUse:
        - kubernetes.io/vsphere-volume/[NETAPP_DS01] kubevols/abcd-nnzv9-dynamic-pvc-51f92d53-5ae8-4a4b-9074-df07803e46ea.vmdk
      ~~~
        But it was not populated into VolumesAttached section in worker04 node. 
      
      Then after checking the kube-controller-manager pod logs and found below logs of goroutine stack trace of panic: 
      ~~~
      2022-11-17T15:17:08.611154688Z E1117 15:17:08.611083       1 nestedpendingoperations.go:335] Operation for "{volumeName:kubernetes.io/vsphere-volume/[NETAPP_DS01] kubevols/abcd-nnzv9-dynamic-pvc-51f92d53-5ae8-4a4b-9074-df07803e46ea.vmdk podName: nodeName:}" failed. No retries permitted until 2022-11-17 15:17:09.11104555 +0000 UTC m=+888261.158159112 (durationBeforeRetry 500ms). Error: recovered from panic "runtime error: index out of range [16] with length 16". (err=<nil>) Call stack:
      2022-11-17T15:17:08.611154688Z goroutine 19411574 [running]:
      2022-11-17T15:17:08.611154688Z k8s.io/kubernetes/vendor/k8s.io/apimachinery/pkg/util/runtime.RecoverFromPanic(0xc008d93e78)
      2022-11-17T15:17:08.611154688Z  vendor/k8s.io/apimachinery/pkg/util/runtime/runtime.go:159 +0x6b
      2022-11-17T15:17:08.611154688Z panic({0x4526ee0, 0xc010442cd8})
      2022-11-17T15:17:08.611154688Z  /usr/lib/golang/src/runtime/panic.go:844 +0x258
      2022-11-17T15:17:08.611154688Z k8s.io/kubernetes/vendor/k8s.io/legacy-cloud-providers/vsphere/vclib.getNextUnitNumber({0xc0050d4800, 0xf, 0xc0050d4800?}, {0x508cf80?, 0xc00a613b90?})
      2022-11-17T15:17:08.611154688Z  vendor/k8s.io/legacy-cloud-providers/vsphere/vclib/utils.go:98 +0x15a
      2022-11-17T15:17:08.611154688Z k8s.io/kubernetes/vendor/k8s.io/legacy-cloud-providers/vsphere/vclib.(*VirtualMachine).CreateDiskSpec(0xc0059bf910, {0x50b88d0, 0xc010158a80}, {0xc004f16850, 0x64}, 0xc01300fc10, 0xc008d93898)
      2022-11-17T15:17:08.611154688Z  vendor/k8s.io/legacy-cloud-providers/vsphere/vclib/virtualmachine.go:338 +0x65f
      2022-11-17T15:17:08.611154688Z k8s.io/kubernetes/vendor/k8s.io/legacy-cloud-providers/vsphere/vclib.(*VirtualMachine).AttachDisk(0xc0059bf910, {0x50b88d0, 0xc010158a80}, {0xc004f16850, 0x64}, 0xc008d93898)
      2022-11-17T15:17:08.611154688Z  vendor/k8s.io/legacy-cloud-providers/vsphere/vclib/virtualmachine.go:101 +0x453
      2022-11-17T15:17:08.611154688Z k8s.io/kubernetes/vendor/k8s.io/legacy-cloud-providers/vsphere.(*VSphere).AttachDisk.func1({0xc004f16850, 0x64}, {0x0, 0x0}, {0xc00066a9f0?, 0x0?})
      2022-11-17T15:17:08.611154688Z  vendor/k8s.io/legacy-cloud-providers/vsphere/vsphere.go:969 +0x396
      2022-11-17T15:17:08.611154688Z k8s.io/kubernetes/vendor/k8s.io/legacy-cloud-providers/vsphere.(*VSphere).AttachDisk(0xc0016b2600, {0xc004f16850, 0x64}, {0x0, 0x0}, {0xc00066a9f0, 0x24})
      2022-11-17T15:17:08.611154688Z  vendor/k8s.io/legacy-cloud-providers/vsphere/vsphere.go:977 +0xe4
      2022-11-17T15:17:08.611154688Z k8s.io/kubernetes/pkg/volume/vsphere_volume.(*vsphereVMDKAttacher).Attach(0xc011f9a4e0, 0xc00ecfde48?, {0xc00066a9f0, 0x24})
      2022-11-17T15:17:08.611154688Z  pkg/volume/vsphere_volume/attacher.go:92 +0x20a
      2022-11-17T15:17:08.611154688Z k8s.io/kubernetes/pkg/volume/util/operationexecutor.(*operationGenerator).GenerateAttachVolumeFunc.func1()
      2022-11-17T15:17:08.611154688Z  pkg/volume/util/operationexecutor/operation_generator.go:372 +0x1ae
      2022-11-17T15:17:08.611154688Z k8s.io/kubernetes/pkg/volume/util/types.(*GeneratedOperations).Run(0xc00d312a80)
      2022-11-17T15:17:08.611154688Z  pkg/volume/util/types/types.go:79 +0x1c2
      2022-11-17T15:17:08.611154688Z k8s.io/kubernetes/pkg/volume/util/nestedpendingoperations.(*nestedPendingOperations).Run.func1()
      2022-11-17T15:17:08.611154688Z  pkg/volume/util/nestedpendingoperations/nestedpendingoperations.go:190 +0xf4
      2022-11-17T15:17:08.611154688Z created by k8s.io/kubernetes/pkg/volume/util/nestedpendingoperations.(*nestedPendingOperations).Run
      2022-11-17T15:17:08.611154688Z  pkg/volume/util/nestedpendingoperations/nestedpendingoperations.go:185 +0x8ca
       ~~~
      
      This is only getting observed when attachment is happening for only worker04 node. For other worker node, attachment is happening successfully. 

      Version-Release number of selected component (if applicable):

      OCP version 4.11.9

      How reproducible:

      Always for worker04 node

      Steps to Reproduce:

      1. Schedule pod on other worker node
      2. Delete pod and pod gets scheduled on worker04 node
      3. Pod gets stuck in ContainerCreating state with attachment failure operation.
      

      Actual results:

      Pod is stuck in ContainerCreating State with failed operation of volume attachment of PV provisioned by thin storageclass.

      Expected results:

      Pod should be in Running State with successful operation of volume attachment of PV provisioned by thin storageclass.

      Additional info:

      Volume attachment of PV provisioned by thin-csi storageclass is successful on the same worker04 node. 
      
      Attaching must gather and sosreport details.

       

       

            jdobson@redhat.com Jonathan Dobson
            rhn-support-adeshpan Aditya Deshpande
            Wei Duan Wei Duan
            Votes:
            0 Vote for this issue
            Watchers:
            9 Start watching this issue

              Created:
              Updated:
              Resolved: