-
Closed Loop
-
Resolution: Unresolved
-
Blocker
-
None
-
None
-
None
-
Future Sustainability
-
False
-
-
False
-
None
-
-
None
Description of problem:
We have performed a HA test in a cluster hosted in OpenShift Virtualization. The test simply powered off a bare metal machine where a worker VM was running. The worker VM is correctly restarted in a different node, however a pod controlled by a StatefulSet is stuck in deleting and the kubevirt-csi-node pod of the affected node is stuck in a loop trying to unmount the volume from this pod and other non-existing (deleted) pods.
Version-Release number of selected component (if applicable):
BareMetal cluster: OCP 4.18.23 kubevirt-hyperconverged-operator.v4.18.8 advanced-cluster-management.v2.13.3 multicluster-engine.v2.8.3 Hosted cluster: OCP 4.18.19
How reproducible:
Always in customer environment. I failed to reproduce locally.
Steps to Reproduce:
1. In a hosted cluster, run a pod with a PVC (Filesystem, RWO) controlled by a StatefulSet. Take note of the worker node used by the pod. 2. Power off the bare metal node where the worker VM is running.
Actual results:
- The worker VM is restarted correctly in another node. The associated hp-volume pod is also started and the PVC used by the pod is hot plugged to the VM.
- In the hosted cluster, kube-controller-manager deletes the pod because of a taint eviction:
~~~
2025-10-27T12:03:26.494945078Z I1027 12:03:26.494907 1 taint_eviction.go:111] "Deleting pod" logger="taint-eviction-controller" controller="taint-eviction-controller" pod="namespace/pod-name-1"
~~~
- The pod has a deletionTimestamp, but it's stuck in Failed phase.
- The csi-driver container of the kubevirt-csi-node pod running in the affected node is in a loop trying to unpublish the volume from the node every ~2 minutes. Our pod has uid '08631964-27f9-4a18-aa71-a899adf32a44', however it's also trying to unmount it from other deleted pods that no longer exist:
~~~
2025-10-27T16:14:10.918707610+04:00 I1027 12:14:10.918620 1 server.go:121] /csi.v1.Node/NodeUnpublishVolume called with request: {"target_path":"/var/lib/kubelet/pods/a8dbd5a3-0920-4982-9796-78b03672bd27/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount","volume_id":"pvc-52a2c31a-0f30-4eb4-8292-51468483b477"}
2025-10-27T16:14:10.918857242+04:00 I1027 12:14:10.918797 1 node.go:300] Node Unpublish Request: volume_id:"pvc-52a2c31a-0f30-4eb4-8292-51468483b477" target_path:"/var/lib/kubelet/pods/a8dbd5a3-0920-4982-9796-78b03672bd27/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount"
2025-10-27T16:14:10.918857242+04:00 I1027 12:14:10.918812 1 node.go:308] Unmounting /var/lib/kubelet/pods/a8dbd5a3-0920-4982-9796-78b03672bd27/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount
2025-10-27T16:14:10.918857242+04:00 I1027 12:14:10.918818 1 mount_linux.go:239] Unmounting /var/lib/kubelet/pods/a8dbd5a3-0920-4982-9796-78b03672bd27/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount
2025-10-27T16:14:10.918901021+04:00 I1027 12:14:10.918633 1 server.go:121] /csi.v1.Node/NodeUnpublishVolume called with request: {"target_path":"/var/lib/kubelet/pods/305b40be-5f8b-422c-9fe5-00d4789a8ab9/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount","volume_id":"pvc-52a2c31a-0f30-4eb4-8292-51468483b477"}
2025-10-27T16:14:10.918950422+04:00 I1027 12:14:10.918918 1 node.go:300] Node Unpublish Request: volume_id:"pvc-52a2c31a-0f30-4eb4-8292-51468483b477" target_path:"/var/lib/kubelet/pods/305b40be-5f8b-422c-9fe5-00d4789a8ab9/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount"
2025-10-27T16:14:10.918950422+04:00 I1027 12:14:10.918698 1 server.go:121] /csi.v1.Node/NodeUnpublishVolume called with request: {"target_path":"/var/lib/kubelet/pods/faee5c8c-011c-4ca1-aabc-e64dfe68fde5/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount","volume_id":"pvc-52a2c31a-0f30-4eb4-8292-51468483b477"}
2025-10-27T16:14:10.919030532+04:00 I1027 12:14:10.918976 1 node.go:300] Node Unpublish Request: volume_id:"pvc-52a2c31a-0f30-4eb4-8292-51468483b477" target_path:"/var/lib/kubelet/pods/faee5c8c-011c-4ca1-aabc-e64dfe68fde5/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount"
2025-10-27T16:14:10.919030532+04:00 I1027 12:14:10.918990 1 node.go:308] Unmounting /var/lib/kubelet/pods/faee5c8c-011c-4ca1-aabc-e64dfe68fde5/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount
2025-10-27T16:14:10.919030532+04:00 I1027 12:14:10.918998 1 mount_linux.go:239] Unmounting /var/lib/kubelet/pods/faee5c8c-011c-4ca1-aabc-e64dfe68fde5/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount
2025-10-27T16:14:10.919077953+04:00 I1027 12:14:10.918938 1 node.go:308] Unmounting /var/lib/kubelet/pods/305b40be-5f8b-422c-9fe5-00d4789a8ab9/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount
2025-10-27T16:14:10.919077953+04:00 I1027 12:14:10.919065 1 mount_linux.go:239] Unmounting /var/lib/kubelet/pods/305b40be-5f8b-422c-9fe5-00d4789a8ab9/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount
2025-10-27T16:14:10.920716001+04:00 E1027 12:14:10.920675 1 node.go:311] failed to unmount unmount failed: exit status 32
2025-10-27T16:14:10.920716001+04:00 Unmounting arguments: /var/lib/kubelet/pods/a8dbd5a3-0920-4982-9796-78b03672bd27/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount
2025-10-27T16:14:10.920716001+04:00 Output: umount: /var/lib/kubelet/pods/a8dbd5a3-0920-4982-9796-78b03672bd27/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount: not mounted.
2025-10-27T16:14:10.920716001+04:00 E1027 12:14:10.920702 1 server.go:124] /csi.v1.Node/NodeUnpublishVolume returned with error: unmount failed: exit status 32
2025-10-27T16:14:10.920716001+04:00 Unmounting arguments: /var/lib/kubelet/pods/a8dbd5a3-0920-4982-9796-78b03672bd27/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount
2025-10-27T16:14:10.920716001+04:00 Output: umount: /var/lib/kubelet/pods/a8dbd5a3-0920-4982-9796-78b03672bd27/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount: not mounted.
2025-10-27T16:14:10.921101973+04:00 E1027 12:14:10.921070 1 node.go:311] failed to unmount unmount failed: exit status 32
2025-10-27T16:14:10.921101973+04:00 Unmounting arguments: /var/lib/kubelet/pods/305b40be-5f8b-422c-9fe5-00d4789a8ab9/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount
2025-10-27T16:14:10.921101973+04:00 Output: umount: /var/lib/kubelet/pods/305b40be-5f8b-422c-9fe5-00d4789a8ab9/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount: not mounted.
2025-10-27T16:14:10.921110363+04:00 E1027 12:14:10.921098 1 server.go:124] /csi.v1.Node/NodeUnpublishVolume returned with error: unmount failed: exit status 32
2025-10-27T16:14:10.921110363+04:00 Unmounting arguments: /var/lib/kubelet/pods/305b40be-5f8b-422c-9fe5-00d4789a8ab9/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount
2025-10-27T16:14:10.921110363+04:00 Output: umount: /var/lib/kubelet/pods/305b40be-5f8b-422c-9fe5-00d4789a8ab9/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount: not mounted.
2025-10-27T16:14:10.921535354+04:00 E1027 12:14:10.921511 1 node.go:311] failed to unmount unmount failed: exit status 32
2025-10-27T16:14:10.921535354+04:00 Unmounting arguments: /var/lib/kubelet/pods/faee5c8c-011c-4ca1-aabc-e64dfe68fde5/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount
2025-10-27T16:14:10.921535354+04:00 Output: umount: /var/lib/kubelet/pods/faee5c8c-011c-4ca1-aabc-e64dfe68fde5/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount: not mounted.
2025-10-27T16:14:10.921549415+04:00 E1027 12:14:10.921524 1 server.go:124] /csi.v1.Node/NodeUnpublishVolume returned with error: unmount failed: exit status 32
2025-10-27T16:14:10.921549415+04:00 Unmounting arguments: /var/lib/kubelet/pods/faee5c8c-011c-4ca1-aabc-e64dfe68fde5/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount
2025-10-27T16:14:10.921549415+04:00 Output: umount: /var/lib/kubelet/pods/faee5c8c-011c-4ca1-aabc-e64dfe68fde5/volumes/kubernetes.io~csi/pvc-52a2c31a-0f30-4eb4-8292-51468483b477/mount: not mounted.
~~~
Expected results:
If the PVC is no longer mounted, the CSI driver should move on and not retry forever.
Additional info:
- relates to
-
CNV-71540 kubevirt-csi-node is stuck in a loop trying to unpublish volume from node
-
- ASSIGNED
-