-
Bug
-
Resolution: Cannot Reproduce
-
Normal
-
None
-
4.17.z, 4.16.z
-
None
-
None
-
False
-
-
None
-
Moderate
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Pod termination failed due to container storage unmount error (device or resource busy).
The upgrade is stuck due to lower revision pods are in terminating state, resulting new revision being stuck:
~~~
Dec 15 14:00:27 xyz-master-0 kubenswrapper[1963730]: E1215 14:00:27.327483 1963730 pod_workers.go:1298] "Error syncing pod, skipping" err="failed to \"KillPodSandbox\" for \"45abd18873ece2fa0c1a9b927a3b679b\" with KillPodSandboxError: \"rpc error: code = Unknown desc = failed to stop infra container for pod sandbox 851172f8507471421c218857d6e42508b8a8bcc09bae68940f8c275da2befa1f: failed to unmount container 851172f8507471421c218857d6e42508b8a8bcc09bae68940f8c275da2befa1f: removing mount point \\\"/var/lib/containers/storage/overlay/7d826cca019cced93bc33b97a5dc7a46240f0f3ffc75631c88df7508bfcabf3b/merged\\\": device or resource busy\"" pod="openshift-kube-scheduler/openshift-kube-scheduler-xyz-master-0" podUID="45abd18873ece2fa0c1a9b927a3b679b"
~~~
The container storage filesystem was mounted successfully; however, the unmount system call failed during container teardown.
As a result, the underlying overlay filesystem resources could not be released, preventing container cleanup and causing affected containers to remain in the Terminating state. Not sure why the umount failed.
WorkAround: Since the underlying CRI-O storage is affected, the only solution is to clean the CRI-O storage.
I have few customers who faced this issues, one of them is expecting RCA. Since they are facing these issues while upgrading every cluster from 4.16.x to 4.17.x
- is related to
-
OCPBUGS-74694 Pods stuck terminating on cri-o unmounting overlayfs /merged path
-
- New
-
- links to