Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-71540

kubevirt-csi-node is stuck in a loop trying to unpublish volume from node

XMLWordPrintable

    • Incidents & Support
    • 5
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • CNV Storage Sprint 280
    • Critical
    • Customer Escalated, Customer Reported
    • None

      Description of problem:

      We have performed a HA test in a cluster hosted in OpenShift Virtualization. The test simply powered off a bare metal machine where a worker VM was running.
      
      The worker VM is correctly restarted in a different node, however a pod controlled by a StatefulSet is stuck in deleting and the kubevirt-csi-node pod of the affected node is stuck in a loop trying to unmount the volume from this pod and other non-existing (deleted) pods.

      Version-Release number of selected component (if applicable):

      BareMetal cluster:
        OCP 4.18.23
        kubevirt-hyperconverged-operator.v4.18.8
        advanced-cluster-management.v2.13.3
        multicluster-engine.v2.8.3
      
      Hosted cluster:
        OCP 4.18.19

      How reproducible:

      Always in customer environment. I failed to reproduce locally.

      Steps to Reproduce:

      1. In a hosted cluster, run a pod with a PVC (Filesystem, RWO) controlled by a StatefulSet. Take note of the worker node used by the pod.
      2. Power off the bare metal node where the worker VM is running. 

      Actual results:

      - The worker VM is restarted correctly in another node. The associated hp-volume pod is also started and the PVC used by the pod is hot plugged to the VM.
      
      - In the hosted cluster, kube-controller-manager deletes the pod because of a taint eviction:
      
      ~~~
      2025-10-27T12:03:26.494945078Z I1027 12:03:26.494907       1 taint_eviction.go:111] "Deleting pod" logger="taint-eviction-controller" controller="taint-eviction-controller" pod="namespace/pod-name-1"
      ~~~
      
      - The pod has a deletionTimestamp, but it's stuck in Failed phase.
      
      - The csi-driver container of the kubevirt-csi-node pod running in the affected node is in a loop trying to unpublish the volume from the node every ~2 minutes. Our pod has uid '08631964-27f9-4a18-aa71-a899adf32a44', however it's also trying to unmount it from other deleted pods that no longer exist:
      
      ~~~
      2025-10-27T16:14:10.918707610+04:00 I1027 12:14:10.918620       1 server.go:121] /csi.v1.Node/NodeUnpublishVolume called with request: {"target_path":"/var/lib/kubelet/pods/a8dbd5a3-0920-4982-9796-78b03672bd27/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount","volume_id":"pvc-52a2c31a-0f30-4eb4-8292-51468483b477"}
      2025-10-27T16:14:10.918857242+04:00 I1027 12:14:10.918797       1 node.go:300] Node Unpublish Request: volume_id:"pvc-52a2c31a-0f30-4eb4-8292-51468483b477" target_path:"/var/lib/kubelet/pods/a8dbd5a3-0920-4982-9796-78b03672bd27/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount"
      2025-10-27T16:14:10.918857242+04:00 I1027 12:14:10.918812       1 node.go:308] Unmounting /var/lib/kubelet/pods/a8dbd5a3-0920-4982-9796-78b03672bd27/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount
      2025-10-27T16:14:10.918857242+04:00 I1027 12:14:10.918818       1 mount_linux.go:239] Unmounting /var/lib/kubelet/pods/a8dbd5a3-0920-4982-9796-78b03672bd27/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount
      2025-10-27T16:14:10.918901021+04:00 I1027 12:14:10.918633       1 server.go:121] /csi.v1.Node/NodeUnpublishVolume called with request: {"target_path":"/var/lib/kubelet/pods/305b40be-5f8b-422c-9fe5-00d4789a8ab9/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount","volume_id":"pvc-52a2c31a-0f30-4eb4-8292-51468483b477"}
      2025-10-27T16:14:10.918950422+04:00 I1027 12:14:10.918918       1 node.go:300] Node Unpublish Request: volume_id:"pvc-52a2c31a-0f30-4eb4-8292-51468483b477" target_path:"/var/lib/kubelet/pods/305b40be-5f8b-422c-9fe5-00d4789a8ab9/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount"
      2025-10-27T16:14:10.918950422+04:00 I1027 12:14:10.918698       1 server.go:121] /csi.v1.Node/NodeUnpublishVolume called with request: {"target_path":"/var/lib/kubelet/pods/faee5c8c-011c-4ca1-aabc-e64dfe68fde5/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount","volume_id":"pvc-52a2c31a-0f30-4eb4-8292-51468483b477"}
      2025-10-27T16:14:10.919030532+04:00 I1027 12:14:10.918976       1 node.go:300] Node Unpublish Request: volume_id:"pvc-52a2c31a-0f30-4eb4-8292-51468483b477" target_path:"/var/lib/kubelet/pods/faee5c8c-011c-4ca1-aabc-e64dfe68fde5/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount"
      2025-10-27T16:14:10.919030532+04:00 I1027 12:14:10.918990       1 node.go:308] Unmounting /var/lib/kubelet/pods/faee5c8c-011c-4ca1-aabc-e64dfe68fde5/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount
      2025-10-27T16:14:10.919030532+04:00 I1027 12:14:10.918998       1 mount_linux.go:239] Unmounting /var/lib/kubelet/pods/faee5c8c-011c-4ca1-aabc-e64dfe68fde5/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount
      2025-10-27T16:14:10.919077953+04:00 I1027 12:14:10.918938       1 node.go:308] Unmounting /var/lib/kubelet/pods/305b40be-5f8b-422c-9fe5-00d4789a8ab9/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount
      2025-10-27T16:14:10.919077953+04:00 I1027 12:14:10.919065       1 mount_linux.go:239] Unmounting /var/lib/kubelet/pods/305b40be-5f8b-422c-9fe5-00d4789a8ab9/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount
      2025-10-27T16:14:10.920716001+04:00 E1027 12:14:10.920675       1 node.go:311] failed to unmount unmount failed: exit status 32
      2025-10-27T16:14:10.920716001+04:00 Unmounting arguments: /var/lib/kubelet/pods/a8dbd5a3-0920-4982-9796-78b03672bd27/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount
      2025-10-27T16:14:10.920716001+04:00 Output: umount: /var/lib/kubelet/pods/a8dbd5a3-0920-4982-9796-78b03672bd27/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount: not mounted.
      2025-10-27T16:14:10.920716001+04:00 E1027 12:14:10.920702       1 server.go:124] /csi.v1.Node/NodeUnpublishVolume returned with error: unmount failed: exit status 32
      2025-10-27T16:14:10.920716001+04:00 Unmounting arguments: /var/lib/kubelet/pods/a8dbd5a3-0920-4982-9796-78b03672bd27/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount
      2025-10-27T16:14:10.920716001+04:00 Output: umount: /var/lib/kubelet/pods/a8dbd5a3-0920-4982-9796-78b03672bd27/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount: not mounted.
      2025-10-27T16:14:10.921101973+04:00 E1027 12:14:10.921070       1 node.go:311] failed to unmount unmount failed: exit status 32
      2025-10-27T16:14:10.921101973+04:00 Unmounting arguments: /var/lib/kubelet/pods/305b40be-5f8b-422c-9fe5-00d4789a8ab9/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount
      2025-10-27T16:14:10.921101973+04:00 Output: umount: /var/lib/kubelet/pods/305b40be-5f8b-422c-9fe5-00d4789a8ab9/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount: not mounted.
      2025-10-27T16:14:10.921110363+04:00 E1027 12:14:10.921098       1 server.go:124] /csi.v1.Node/NodeUnpublishVolume returned with error: unmount failed: exit status 32
      2025-10-27T16:14:10.921110363+04:00 Unmounting arguments: /var/lib/kubelet/pods/305b40be-5f8b-422c-9fe5-00d4789a8ab9/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount
      2025-10-27T16:14:10.921110363+04:00 Output: umount: /var/lib/kubelet/pods/305b40be-5f8b-422c-9fe5-00d4789a8ab9/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount: not mounted.
      2025-10-27T16:14:10.921535354+04:00 E1027 12:14:10.921511       1 node.go:311] failed to unmount unmount failed: exit status 32
      2025-10-27T16:14:10.921535354+04:00 Unmounting arguments: /var/lib/kubelet/pods/faee5c8c-011c-4ca1-aabc-e64dfe68fde5/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount
      2025-10-27T16:14:10.921535354+04:00 Output: umount: /var/lib/kubelet/pods/faee5c8c-011c-4ca1-aabc-e64dfe68fde5/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount: not mounted.
      2025-10-27T16:14:10.921549415+04:00 E1027 12:14:10.921524       1 server.go:124] /csi.v1.Node/NodeUnpublishVolume returned with error: unmount failed: exit status 32
      2025-10-27T16:14:10.921549415+04:00 Unmounting arguments: /var/lib/kubelet/pods/faee5c8c-011c-4ca1-aabc-e64dfe68fde5/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount
      2025-10-27T16:14:10.921549415+04:00 Output: umount: /var/lib/kubelet/pods/faee5c8c-011c-4ca1-aabc-e64dfe68fde5/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount: not mounted.
      ~~~
      

      Expected results:

      If the PVC is no longer mounted, the CSI driver should move on and not retry forever.

      Additional info:

       

              akalenyu Alex Kalenyuk
              rhn-support-jortialc Juan Orti
              Yan Du Yan Du
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated: