Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.18.z
Component/s: Node / Kubelet
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:
None
Target Version:
None
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

revision-pruner pods remain stuck in ContainerStatusUnknown state indefinitely after node reboots when revision-pruner pod is in Running status

Version-Release number of selected component (if applicable):

OpenShift: 4.18.22

How reproducible:

Force a node restart when revision-pruner pod is in Running status

Steps to Reproduce:

1. Delete existing completed revision-pruner pods to force recreation:

oc delete pod -n openshift-kube-apiserver \
  revision-pruner-15-mno1-ctlplane-0 \
  revision-pruner-15-mno1-ctlplane-1 \
  revision-pruner-15-mno1-ctlplane-2

2. Force operator reconciliation to recreate the pod:

oc get pod -n openshift-kube-apiserver-operator
oc delete pod -n openshift-kube-apiserver-operator kube-apiserver-operator-85b695c66b-8s8w9

3. Wait for the new revision-pruner pod to be created

4. Immediately reboot the node while the pod is running:

ssh core@mno1-ctlplane-2 "sudo reboot"

5. After node reboot completes, check pod status:

oc get pods -n openshift-kube-apiserver

Actual results:

Pod remains stuck in ContainerStatusUnknown state indefinitely

$ oc get pod revision-pruner-15-mno1-ctlplane-2 -n openshift-kube-apiserver
NAME                                   READY   STATUS                   RESTARTS   AGE
revision-pruner-15-mno1-ctlplane-2     0/1     ContainerStatusUnknown   1          7m35s

Pod status shows:
    lastState:
      terminated:
        exitCode: 137
        finishedAt: null
        message: The container could not be located when the pod was deleted.  The
          container used to be Running
        reason: ContainerStatusUnknown
        startedAt: null
    name: pruner
    ready: false
    restartCount: 1
    started: false
    state:
      terminated:
        exitCode: 137
        finishedAt: null
        message: The container could not be located when the pod was terminated
        reason: ContainerStatusUnknown
        startedAt: null

From the logs, it looks like Kubelet may reconcile pods immediately after a node reboot, before CNI/node networking is ready. During that NotReady window, volume mounts fail, leaving pods stuck in ContainerStatusUnknown until manually cleaned up.

[root@mno1-ctlplane-2 ~]# journalctl -f | grep revision-pruner
Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: I1104 14:49:09.048799    3871 kubelet.go:2421] "SyncLoop ADD" source="api" pods=["openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2" ...]

Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: I1104 14:49:09.067379    3871 util.go:30] "No sandbox for pod can be found. Need to start a new one" pod="openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2"

Nov 04 14:49:09 mno1-ctlplane-2 bash[3800]: time="2025-11-04 14:49:09.073113655Z" level=error msg="Failed to cleanup (probably retrying): failed to destroy network for pod sandbox k8s_revision-pruner-15-mno1-ctlplane-2_openshift-kube-apiserver_e3b0b201-e143-4a49-9770-d68528d32230_0(b407b68e306745e09d590d70ae93b5c3228e54dedf7cb8973d1238e75d42a35a): no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?: stat netns path \"\": stat : no such file or directory"

Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: I1104 14:49:09.183249    3871 reconciler_common.go:245] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kubelet-dir\" (UniqueName: \"kubernetes.io/host-path/ab6fecf1-fa11-40e4-83ed-448b5b26ac57-kubelet-dir\") pod \"revision-pruner-15-mno1-ctlplane-2\" (UID: \"ab6fecf1-fa11-40e4-83ed-448b5b26ac57\") " pod="openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2"
Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: I1104 14:49:09.187805    3871 reconciler_common.go:218] "operationExecutor.MountVolume started for volume \"kube-api-access\" (UniqueName: \"kubernetes.io/projected/ab6fecf1-fa11-40e4-83ed-448b5b26ac57-kube-api-access\") pod \"revision-pruner-15-mno1-ctlplane-2\" (UID: \"ab6fecf1-fa11-40e4-83ed-448b5b26ac57\") " pod="openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2"

Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: I1104 14:49:09.201538    3871 util.go:30] "No sandbox for pod can be found. Need to start a new one" pod="openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2"
Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: E1104 14:49:09.223751    3871 projected.go:194] Error preparing data for projected volume kube-api-access for pod openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2: object "openshift-kube-apiserver"/"kube-root-ca.crt" not registered
Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: E1104 14:49:09.223854    3871 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/projected/ab6fecf1-fa11-40e4-83ed-448b5b26ac57-kube-api-access podName:ab6fecf1-fa11-40e4-83ed-448b5b26ac57 nodeName:}" failed. No retries permitted until 2025-11-04 14:49:09.723834687 +0000 UTC m=+1.846776519 (durationBeforeRetry 500ms). Error: MountVolume.SetUp failed for volume "kube-api-access" (UniqueName: "kubernetes.io/projected/ab6fecf1-fa11-40e4-83ed-448b5b26ac57-kube-api-access") pod "revision-pruner-15-mno1-ctlplane-2" (UID: "ab6fecf1-fa11-40e4-83ed-448b5b26ac57") : object "openshift-kube-apiserver"/"kube-root-ca.crt" not registered
Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: I1104 14:49:09.292872    3871 reconciler_common.go:218] "operationExecutor.MountVolume started for volume \"kubelet-dir\" (UniqueName: \"kubernetes.io/host-path/ab6fecf1-fa11-40e4-83ed-448b5b26ac57-kubelet-dir\") pod \"revision-pruner-15-mno1-ctlplane-2\" (UID: \"ab6fecf1-fa11-40e4-83ed-448b5b26ac57\") " pod="openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2"
Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: I1104 14:49:09.293077    3871 operation_generator.go:637] "MountVolume.SetUp succeeded for volume \"kubelet-dir\" (UniqueName: \"kubernetes.io/host-path/ab6fecf1-fa11-40e4-83ed-448b5b26ac57-kubelet-dir\") pod \"revision-pruner-15-mno1-ctlplane-2\" (UID: \"ab6fecf1-fa11-40e4-83ed-448b5b26ac57\") " pod="openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2"

Nov 04 14:49:10 mno1-ctlplane-2 bash[3800]: time="2025-11-04 14:49:10.199767594Z" level=error msg="Failed to cleanup (probably retrying): failed to destroy network for pod sandbox k8s_revision-pruner-15-mno1-ctlplane-2_openshift-kube-apiserver_e3b0b201-e143-4a49-9770-d68528d32230_0(b407b68e306745e09d590d70ae93b5c3228e54dedf7cb8973d1238e75d42a35a): no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?: stat netns path \"\": stat : no such file or directory"
Nov 04 14:49:10 mno1-ctlplane-2 bash[3871]: I1104 14:49:10.376966    3871 util.go:30] "No sandbox for pod can be found. Need to start a new one" pod="openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2"

Nov 04 14:49:24 mno1-ctlplane-2 bash[3800]: time="2025-11-04 14:49:24.302631882Z" level=info msg="Got pod network &{Name:revision-pruner-15-mno1-ctlplane-2 Namespace:openshift-kube-apiserver ID:0980ddd6bc3fa1b78388af71e76ba1b3b10cf378ca308fd30ae80e120b01c216 UID:ab6fecf1-fa11-40e4-83ed-448b5b26ac57 NetNS: Networks:[{Name:multus-cni-network Ifname:eth0}] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[] CgroupPath:kubepods-podab6fecf1_fa11_40e4_83ed_448b5b26ac57.slice PodAnnotations:0xc0008e6428}] Aliases:map[]}"

Nov 04 14:49:24 mno1-ctlplane-2 bash[3800]: time="2025-11-04 14:49:24.302873415Z" level=info msg="Deleting pod openshift-kube-apiserver_revision-pruner-15-mno1-ctlplane-2 from CNI network \"multus-cni-network\" (type=multus-shim)"

Nov 04 14:49:24 mno1-ctlplane-2 bash[3800]: 2025-11-04T14:49:24Z [error] CmdDel (shim): CNI request failed with status 400: 'ContainerID:"0980ddd6bc3fa1b78388af71e76ba1b3b10cf378ca308fd30ae80e120b01c216" Netns:"" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=openshift-kube-apiserver;K8S_POD_NAME=revision-pruner-15-mno1-ctlplane-2;K8S_POD_INFRA_CONTAINER_ID=0980ddd6bc3fa1b78388af71e76ba1b3b10cf378ca308fd30ae80e120b01c216;K8S_POD_UID=ab6fecf1-fa11-40e4-83ed-448b5b26ac57" Path:"" ERRORED: DelegateDel: error invoking DelegateDel - "ovn-k8s-cni-overlay": error in getting result from DelNetwork: CNI request failed with status 400: '[openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2 0980ddd6bc3fa1b78388af71e76ba1b3b10cf378ca308fd30ae80e120b01c216 network default NAD default] [openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2 0980ddd6bc3fa1b78388af71e76ba1b3b10cf378ca308fd30ae80e120b01c216 network default NAD default] failed to get container namespace for pod openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2 NAD default: failed to Statfs "": no such file or directory

Expected results:

Pods should be queued for reconciliation after the node becomes Ready and the operator should monitor pruner pod status and clean up or recreate failed instances instead of leaving them in stuck states indefinitely.

Attached Files:

journal log from master node and must-gather

Assignee:: Ryan Phillips

Reporter:: Jorge Claret Membrado

Need Info From:: None

Contributors:: None

QA Contact:: Min Li

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2025/11/04 3:44 PM

Updated:: 2025/11/05 4:45 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates