-
Bug
-
Resolution: Duplicate
-
Undefined
-
None
-
4.12.z
-
None
-
No
-
False
-
Description of problem:
As part of a CI run, the command oc delete namespace psapuser95 got stuck for ever (>4h), because of a Pod stuck in the terminating state:
> psapuser95 psapuser95-0 0/2 Terminating 0 4h4m 10.130.14.11 ip-10-0-137-26.us-west-2.compute.internal <none>
In the journal of this node, I can see this printed once:
May 23 01:49:14.085337 ip-10-0-137-26 kubenswrapper[1450]: E0523 01:49:14.085317 1450 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/csi/ebs.csi.aws.com^vol-0bfe06e4e80bff5f6 podName:755344ed-a210-4f52-89b9-df22edf30fbd nodeName:}" failed. No retries permitted until 2023-05-23 01:49:14.585294249 +0000 UTC m=+4675.366414088 (durationBeforeRetry 500ms). Error: UnmountVolume.TearDown failed for volume "psapuser95" (UniqueName: "kubernetes.io/csi/ebs.csi.aws.com^vol-0bfe06e4e80bff5f6") pod "755344ed-a210-4f52-89b9-df22edf30fbd" (UID: "755344ed-a210-4f52-89b9-df22edf30fbd") : kubernetes.io/csi: Unmounter.TearDownAt failed: rpc error: code = Internal desc = Could not unmount "/var/lib/kubelet/pods/755344ed-a210-4f52-89b9-df22edf30fbd/volumes/kubernetes.io~csi/pvc-e59644eb-9c41-4294-9d19-b366b0eb7e4e/mount": unmount failed: exit status 32
and this message is printed hundreds of thousand times:
May 23 03:28:27.202330 ip-10-0-137-26 kubenswrapper[1450]: E0523 03:28:27.202294 1450 reconciler.go:208] "operationExecutor.UnmountVolume failed (controllerAttachDetachEnabled true) for volume \"psapuser95\" (UniqueName: \"kubernetes.io/csi/ebs.csi.aws.com^vol-0bfe06e4e80bff5f6\") pod \"755344ed-a210-4f52-89b9-df22edf30fbd\" (UID: \"755344ed-a210-4f52-89b9-df22edf30fbd\") : UnmountVolume.NewUnmounter failed for volume \"psapuser95\" (UniqueName: \"kubernetes.io/csi/ebs.csi.aws.com^vol-0bfe06e4e80bff5f6\") pod \"755344ed-a210-4f52-89b9-df22edf30fbd\" (UID: \"755344ed-a210-4f52-89b9-df22edf30fbd\") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/755344ed-a210-4f52-89b9-df22edf30fbd/volumes/kubernetes.io~csi/pvc-e59644eb-9c41-4294-9d19-b366b0eb7e4e/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/755344ed-a210-4f52-89b9-df22edf30fbd/volumes/kubernetes.io~csi/pvc-e59644eb-9c41-4294-9d19-b366b0eb7e4e/vol_data.json]: open /var/lib/kubelet/pods/755344ed-a210-4f52-89b9-df22edf30fbd/volumes/kubernetes.io~csi/pvc-e59644eb-9c41-4294-9d19-b366b0eb7e4e/vol_data.json: no such file or directory" err="UnmountVolume.NewUnmounter failed for volume \"psapuser95\" (UniqueName: \"kubernetes.io/csi/ebs.csi.aws.com^vol-0bfe06e4e80bff5f6\") pod \"755344ed-a210-4f52-89b9-df22edf30fbd\" (UID: \"755344ed-a210-4f52-89b9-df22edf30fbd\") : kubernetes.io/csi: unmounter failed to load volume data file [/var/lib/kubelet/pods/755344ed-a210-4f52-89b9-df22edf30fbd/volumes/kubernetes.io~csi/pvc-e59644eb-9c41-4294-9d19-b366b0eb7e4e/mount]: kubernetes.io/csi: failed to open volume data file [/var/lib/kubelet/pods/755344ed-a210-4f52-89b9-df22edf30fbd/volumes/kubernetes.io~csi/pvc-e59644eb-9c41-4294-9d19-b366b0eb7e4e/vol_data.json]: open /var/lib/kubelet/pods/755344ed-a210-4f52-89b9-df22edf30fbd/volumes/kubernetes.io~csi/pvc-e59644eb-9c41-4294-9d19-b366b0eb7e4e/vol_data.json: no such file or directory"
Version-Release number of selected component (if applicable):
4.12.12
How reproducible:
Intermittent
Steps to Reproduce:
1. Create Pods with PVCs on AWS (default classes) 2. Delete the Pods
Actual results:
See the Pod stuck in the Terminating state forever
Expected results:
The Pod gets deleted after a few seconds
Additional info:
The directory below contains various information about the state of the cluster + various information about the nodes. The 'must-gather' was *not* collected.
- duplicates
-
OCPBUGS-14281 Volume unmount repeats after successful unmount, preventing pod delete
- Closed