-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
4.18.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
revision-pruner pods remain stuck in ContainerStatusUnknown state indefinitely after node reboots when revision-pruner pod is in Running status
Version-Release number of selected component (if applicable):
OpenShift: 4.18.22
How reproducible:
Force a node restart when revision-pruner pod is in Running status
Steps to Reproduce:
1. Delete existing completed revision-pruner pods to force recreation:
oc delete pod -n openshift-kube-apiserver \ revision-pruner-15-mno1-ctlplane-0 \ revision-pruner-15-mno1-ctlplane-1 \ revision-pruner-15-mno1-ctlplane-2
2. Force operator reconciliation to recreate the pod:
oc get pod -n openshift-kube-apiserver-operator oc delete pod -n openshift-kube-apiserver-operator kube-apiserver-operator-85b695c66b-8s8w9
3. Wait for the new revision-pruner pod to be created
4. Immediately reboot the node while the pod is running:
ssh core@mno1-ctlplane-2 "sudo reboot"
5. After node reboot completes, check pod status:
oc get pods -n openshift-kube-apiserver
Actual results:
Pod remains stuck in ContainerStatusUnknown state indefinitely
$ oc get pod revision-pruner-15-mno1-ctlplane-2 -n openshift-kube-apiserver
NAME READY STATUS RESTARTS AGE
revision-pruner-15-mno1-ctlplane-2 0/1 ContainerStatusUnknown 1 7m35s
Pod status shows:
lastState:
terminated:
exitCode: 137
finishedAt: null
message: The container could not be located when the pod was deleted. The
container used to be Running
reason: ContainerStatusUnknown
startedAt: null
name: pruner
ready: false
restartCount: 1
started: false
state:
terminated:
exitCode: 137
finishedAt: null
message: The container could not be located when the pod was terminated
reason: ContainerStatusUnknown
startedAt: null
From the logs, it looks like Kubelet may reconcile pods immediately after a node reboot, before CNI/node networking is ready. During that NotReady window, volume mounts fail, leaving pods stuck in ContainerStatusUnknown until manually cleaned up.
[root@mno1-ctlplane-2 ~]# journalctl -f | grep revision-pruner
Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: I1104 14:49:09.048799 3871 kubelet.go:2421] "SyncLoop ADD" source="api" pods=["openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2" ...]
Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: I1104 14:49:09.067379 3871 util.go:30] "No sandbox for pod can be found. Need to start a new one" pod="openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2"
Nov 04 14:49:09 mno1-ctlplane-2 bash[3800]: time="2025-11-04 14:49:09.073113655Z" level=error msg="Failed to cleanup (probably retrying): failed to destroy network for pod sandbox k8s_revision-pruner-15-mno1-ctlplane-2_openshift-kube-apiserver_e3b0b201-e143-4a49-9770-d68528d32230_0(b407b68e306745e09d590d70ae93b5c3228e54dedf7cb8973d1238e75d42a35a): no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?: stat netns path \"\": stat : no such file or directory"
Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: I1104 14:49:09.183249 3871 reconciler_common.go:245] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kubelet-dir\" (UniqueName: \"kubernetes.io/host-path/ab6fecf1-fa11-40e4-83ed-448b5b26ac57-kubelet-dir\") pod \"revision-pruner-15-mno1-ctlplane-2\" (UID: \"ab6fecf1-fa11-40e4-83ed-448b5b26ac57\") " pod="openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2"
Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: I1104 14:49:09.187805 3871 reconciler_common.go:218] "operationExecutor.MountVolume started for volume \"kube-api-access\" (UniqueName: \"kubernetes.io/projected/ab6fecf1-fa11-40e4-83ed-448b5b26ac57-kube-api-access\") pod \"revision-pruner-15-mno1-ctlplane-2\" (UID: \"ab6fecf1-fa11-40e4-83ed-448b5b26ac57\") " pod="openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2"
Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: I1104 14:49:09.201538 3871 util.go:30] "No sandbox for pod can be found. Need to start a new one" pod="openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2"
Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: E1104 14:49:09.223751 3871 projected.go:194] Error preparing data for projected volume kube-api-access for pod openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2: object "openshift-kube-apiserver"/"kube-root-ca.crt" not registered
Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: E1104 14:49:09.223854 3871 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/projected/ab6fecf1-fa11-40e4-83ed-448b5b26ac57-kube-api-access podName:ab6fecf1-fa11-40e4-83ed-448b5b26ac57 nodeName:}" failed. No retries permitted until 2025-11-04 14:49:09.723834687 +0000 UTC m=+1.846776519 (durationBeforeRetry 500ms). Error: MountVolume.SetUp failed for volume "kube-api-access" (UniqueName: "kubernetes.io/projected/ab6fecf1-fa11-40e4-83ed-448b5b26ac57-kube-api-access") pod "revision-pruner-15-mno1-ctlplane-2" (UID: "ab6fecf1-fa11-40e4-83ed-448b5b26ac57") : object "openshift-kube-apiserver"/"kube-root-ca.crt" not registered
Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: I1104 14:49:09.292872 3871 reconciler_common.go:218] "operationExecutor.MountVolume started for volume \"kubelet-dir\" (UniqueName: \"kubernetes.io/host-path/ab6fecf1-fa11-40e4-83ed-448b5b26ac57-kubelet-dir\") pod \"revision-pruner-15-mno1-ctlplane-2\" (UID: \"ab6fecf1-fa11-40e4-83ed-448b5b26ac57\") " pod="openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2"
Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: I1104 14:49:09.293077 3871 operation_generator.go:637] "MountVolume.SetUp succeeded for volume \"kubelet-dir\" (UniqueName: \"kubernetes.io/host-path/ab6fecf1-fa11-40e4-83ed-448b5b26ac57-kubelet-dir\") pod \"revision-pruner-15-mno1-ctlplane-2\" (UID: \"ab6fecf1-fa11-40e4-83ed-448b5b26ac57\") " pod="openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2"
Nov 04 14:49:10 mno1-ctlplane-2 bash[3800]: time="2025-11-04 14:49:10.199767594Z" level=error msg="Failed to cleanup (probably retrying): failed to destroy network for pod sandbox k8s_revision-pruner-15-mno1-ctlplane-2_openshift-kube-apiserver_e3b0b201-e143-4a49-9770-d68528d32230_0(b407b68e306745e09d590d70ae93b5c3228e54dedf7cb8973d1238e75d42a35a): no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?: stat netns path \"\": stat : no such file or directory"
Nov 04 14:49:10 mno1-ctlplane-2 bash[3871]: I1104 14:49:10.376966 3871 util.go:30] "No sandbox for pod can be found. Need to start a new one" pod="openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2"
Nov 04 14:49:24 mno1-ctlplane-2 bash[3800]: time="2025-11-04 14:49:24.302631882Z" level=info msg="Got pod network &{Name:revision-pruner-15-mno1-ctlplane-2 Namespace:openshift-kube-apiserver ID:0980ddd6bc3fa1b78388af71e76ba1b3b10cf378ca308fd30ae80e120b01c216 UID:ab6fecf1-fa11-40e4-83ed-448b5b26ac57 NetNS: Networks:[{Name:multus-cni-network Ifname:eth0}] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[] CgroupPath:kubepods-podab6fecf1_fa11_40e4_83ed_448b5b26ac57.slice PodAnnotations:0xc0008e6428}] Aliases:map[]}"
Nov 04 14:49:24 mno1-ctlplane-2 bash[3800]: time="2025-11-04 14:49:24.302873415Z" level=info msg="Deleting pod openshift-kube-apiserver_revision-pruner-15-mno1-ctlplane-2 from CNI network \"multus-cni-network\" (type=multus-shim)"
Nov 04 14:49:24 mno1-ctlplane-2 bash[3800]: 2025-11-04T14:49:24Z [error] CmdDel (shim): CNI request failed with status 400: 'ContainerID:"0980ddd6bc3fa1b78388af71e76ba1b3b10cf378ca308fd30ae80e120b01c216" Netns:"" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=openshift-kube-apiserver;K8S_POD_NAME=revision-pruner-15-mno1-ctlplane-2;K8S_POD_INFRA_CONTAINER_ID=0980ddd6bc3fa1b78388af71e76ba1b3b10cf378ca308fd30ae80e120b01c216;K8S_POD_UID=ab6fecf1-fa11-40e4-83ed-448b5b26ac57" Path:"" ERRORED: DelegateDel: error invoking DelegateDel - "ovn-k8s-cni-overlay": error in getting result from DelNetwork: CNI request failed with status 400: '[openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2 0980ddd6bc3fa1b78388af71e76ba1b3b10cf378ca308fd30ae80e120b01c216 network default NAD default] [openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2 0980ddd6bc3fa1b78388af71e76ba1b3b10cf378ca308fd30ae80e120b01c216 network default NAD default] failed to get container namespace for pod openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2 NAD default: failed to Statfs "": no such file or directory
Expected results:
Pods should be queued for reconciliation after the node becomes Ready and the operator should monitor pruner pod status and clean up or recreate failed instances instead of leaving them in stuck states indefinitely.
Attached Files: