-
Bug
-
Resolution: Done-Errata
-
Major
-
4.15
-
Important
-
No
-
Rejected
-
False
-
Description of problem:
Delete pod which mounted nutanix csi volume stuck at "Terminating" state
Version-Release number of selected component (if applicable):
4.15.0-0.nightly-2023-10-17-065657
How reproducible:
Always
Steps to Reproduce:
1. Install openshift cluster on nutanix 2. Install the nutanix csi driver operator 3. Create pvc using the nutanix csi storageClass 4. Create pod consumme the pvc 5. After the pod become Running, delete the pod
Actual results:
Step 5 the pod stuck at "Terminating" state couldn't be deleted successfully
Expected results:
Step 5 the pod should delete successfully
Additional info:
$ oc adm node-logs --role worker -u kubelet | grep -i "pvc-5899ceb1-7ba5-4f35-82d0-7c977f05fbf7" ... Oct 19 08:48:32.375798 pewang-1019ns-fnx86-worker-jmwzz kubenswrapper[3673]: E1019 08:48:32.375765 3673 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/csi.nutanix.com^NutanixVolumes-5343b228-acc1-4655-bedc-dfacc4c05b80 podName:ca43201e-9268-4a22-8640-d71d30b28d78 nodeName:}" failed. No retries permitted until 2023-10-19 08:50:34.375725251 +0000 UTC m=+16804.283777473 (durationBeforeRetry 2m2s). Error: UnmountVolume.TearDown failed for volume "data" (UniqueName: "kubernetes.io/csi/csi.nutanix.com^NutanixVolumes-5343b228-acc1-4655-bedc-dfacc4c05b80") pod "ca43201e-9268-4a22-8640-d71d30b28d78" (UID: "ca43201e-9268-4a22-8640-d71d30b28d78") : kubernetes.io/csi: Unmounter.TearDownAt failed to clean mount dir [/var/lib/kubelet/pods/ca43201e-9268-4a22-8640-d71d30b28d78/volumes/kubernetes.io~csi/pvc-5899ceb1-7ba5-4f35-82d0-7c977f05fbf7/mount]: kubernetes.io/csi: failed to remove dir [/var/lib/kubelet/pods/ca43201e-9268-4a22-8640-d71d30b28d78/volumes/kubernetes.io~csi/pvc-5899ceb1-7ba5-4f35-82d0-7c977f05fbf7/mount]: remove /var/lib/kubelet/pods/ca43201e-9268-4a22-8640-d71d30b28d78/volumes/kubernetes.io~csi/pvc-5899ceb1-7ba5-4f35-82d0-7c977f05fbf7/mount: directory not empty ...
Looks like the same root cause with
https://github.com/cri-o/cri-o/issues/7308
Maybe the https://github.com/cri-o/cri-o/pull/7408 should fix it.
- links to
-
RHSA-2023:7198 OpenShift Container Platform 4.15 security update