Description of problem:
- Stateful set pod stuck in a Terminating state - As per Doc: https://kubernetes.io/docs/tasks/run-application/force-delete-stateful-set-pod/#statefulset-considerations , This is a expected behavior as StatefulSet pod needs stabel storage. - Why cri-o can't tell that the pod has in fact been terminated (the process no longer exists) and finish the pod termination flow. 'Reload the cri-o systemd unit so that it can recognise the pod has terminated' Logs: Event logs: ~~~ Warning Unhealthy 3m4s (x811 over 132m) kubelet Readiness probe errored: rpc error: code = NotFound desc = container is not created or running: checking if PID of e05838f64303680b4d2fd5d81788555650cf84c196d67086ca44e873dca12221 is running failed: container process not found ~~ Kubelet logs: ~~~ Apr 08 18:05:42 dell-xyz crio[5637]: time="2025-04-08 18:05:42.926691442Z" level=warning msg="Stopping container e05838f64303680b4d2fd5d81788555650cf84c196d67086ca44e873dca12221 with stop signal timed out. Killing..." Apr 08 18:07:43 dell-xyz crio[5637]: time="2025-04-08 18:07:43.218580890Z" level=info msg="Stopping container: e05838f64303680b4d2fd5d81788555650cf84c196d67086ca44e873dca12221 (timeout: 30s)" id=13d1e782-9694-4142-bc44-005f5e9326b3 name=/runtime.v1.RuntimeService/StopContainer ~~~ Grabbing the pid for the container's process: ~~~ crictl inspect e05838f64303680b4d2fd5d81788555650cf84c196d67086ca44e873dca12221 | jq .info.pid 1502294 ~~~ But the process is not running: ~~~ [root@dell-xyz ~]# ps axo pid=,stat= | grep 1502294 [root@dell-xyz ~]# ps | grep 1502294 [root@dell-xyz ~]# ps aux | grep 1502294 [root@dell-xyz ~]# stat /proc/1502294 stat: cannot statx '/proc/1502294': No such file or directory ~~~
Version-Release number of selected component (if applicable):
How reproducible:
Always
Steps to Reproduce:
When backend storage goes down for all statefull set pod on RHOCP 4.16 in customer enviroment.
Actual results:
Pod stuck in Terminating state wethout any process
Expected results:
Handel Pod Termination properly
Additional info:
- clones
-
OCPBUGS-55485 [4.20] Statefull set pod stuck in a Terminating State when backend storage went down
-
- Verified
-
- is depended on by
-
OCPBUGS-55485 [4.20] Statefull set pod stuck in a Terminating State when backend storage went down
-
- Verified
-