Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-64642

revision-pruner pod stuck in ContainerStatusUnknown

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 4.18.z
    • Node / Kubelet
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      revision-pruner pods remain stuck in ContainerStatusUnknown state indefinitely after node reboots when revision-pruner pod is in Running status 
      

      Version-Release number of selected component (if applicable):

      OpenShift: 4.18.22
      

      How reproducible:

      Force a node restart when revision-pruner pod is in Running status 
      

      Steps to Reproduce:

      1. Delete existing completed revision-pruner pods to force recreation:

      oc delete pod -n openshift-kube-apiserver \
        revision-pruner-15-mno1-ctlplane-0 \
        revision-pruner-15-mno1-ctlplane-1 \
        revision-pruner-15-mno1-ctlplane-2
      

      2. Force operator reconciliation to recreate the pod:

      oc get pod -n openshift-kube-apiserver-operator
      oc delete pod -n openshift-kube-apiserver-operator kube-apiserver-operator-85b695c66b-8s8w9
      

      3. Wait for the new revision-pruner pod to be created

      4. Immediately reboot the node while the pod is running:

      ssh core@mno1-ctlplane-2 "sudo reboot"
      

      5. After node reboot completes, check pod status:

      oc get pods -n openshift-kube-apiserver
      

      Actual results:

      Pod remains stuck in ContainerStatusUnknown state indefinitely

      $ oc get pod revision-pruner-15-mno1-ctlplane-2 -n openshift-kube-apiserver
      NAME                                   READY   STATUS                   RESTARTS   AGE
      revision-pruner-15-mno1-ctlplane-2     0/1     ContainerStatusUnknown   1          7m35s
      
      Pod status shows:
          lastState:
            terminated:
              exitCode: 137
              finishedAt: null
              message: The container could not be located when the pod was deleted.  The
                container used to be Running
              reason: ContainerStatusUnknown
              startedAt: null
          name: pruner
          ready: false
          restartCount: 1
          started: false
          state:
            terminated:
              exitCode: 137
              finishedAt: null
              message: The container could not be located when the pod was terminated
              reason: ContainerStatusUnknown
              startedAt: null
      
      

      From the logs, it looks like Kubelet may reconcile pods immediately after a node reboot, before CNI/node networking is ready. During that NotReady window, volume mounts fail, leaving pods stuck in ContainerStatusUnknown until manually cleaned up.

      [root@mno1-ctlplane-2 ~]# journalctl -f | grep revision-pruner
      Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: I1104 14:49:09.048799    3871 kubelet.go:2421] "SyncLoop ADD" source="api" pods=["openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2" ...]
      
      Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: I1104 14:49:09.067379    3871 util.go:30] "No sandbox for pod can be found. Need to start a new one" pod="openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2"
      
      Nov 04 14:49:09 mno1-ctlplane-2 bash[3800]: time="2025-11-04 14:49:09.073113655Z" level=error msg="Failed to cleanup (probably retrying): failed to destroy network for pod sandbox k8s_revision-pruner-15-mno1-ctlplane-2_openshift-kube-apiserver_e3b0b201-e143-4a49-9770-d68528d32230_0(b407b68e306745e09d590d70ae93b5c3228e54dedf7cb8973d1238e75d42a35a): no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?: stat netns path \"\": stat : no such file or directory"
      
      Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: I1104 14:49:09.183249    3871 reconciler_common.go:245] "operationExecutor.VerifyControllerAttachedVolume started for volume \"kubelet-dir\" (UniqueName: \"kubernetes.io/host-path/ab6fecf1-fa11-40e4-83ed-448b5b26ac57-kubelet-dir\") pod \"revision-pruner-15-mno1-ctlplane-2\" (UID: \"ab6fecf1-fa11-40e4-83ed-448b5b26ac57\") " pod="openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2"
      Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: I1104 14:49:09.187805    3871 reconciler_common.go:218] "operationExecutor.MountVolume started for volume \"kube-api-access\" (UniqueName: \"kubernetes.io/projected/ab6fecf1-fa11-40e4-83ed-448b5b26ac57-kube-api-access\") pod \"revision-pruner-15-mno1-ctlplane-2\" (UID: \"ab6fecf1-fa11-40e4-83ed-448b5b26ac57\") " pod="openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2"
      
      Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: I1104 14:49:09.201538    3871 util.go:30] "No sandbox for pod can be found. Need to start a new one" pod="openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2"
      Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: E1104 14:49:09.223751    3871 projected.go:194] Error preparing data for projected volume kube-api-access for pod openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2: object "openshift-kube-apiserver"/"kube-root-ca.crt" not registered
      Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: E1104 14:49:09.223854    3871 nestedpendingoperations.go:348] Operation for "{volumeName:kubernetes.io/projected/ab6fecf1-fa11-40e4-83ed-448b5b26ac57-kube-api-access podName:ab6fecf1-fa11-40e4-83ed-448b5b26ac57 nodeName:}" failed. No retries permitted until 2025-11-04 14:49:09.723834687 +0000 UTC m=+1.846776519 (durationBeforeRetry 500ms). Error: MountVolume.SetUp failed for volume "kube-api-access" (UniqueName: "kubernetes.io/projected/ab6fecf1-fa11-40e4-83ed-448b5b26ac57-kube-api-access") pod "revision-pruner-15-mno1-ctlplane-2" (UID: "ab6fecf1-fa11-40e4-83ed-448b5b26ac57") : object "openshift-kube-apiserver"/"kube-root-ca.crt" not registered
      Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: I1104 14:49:09.292872    3871 reconciler_common.go:218] "operationExecutor.MountVolume started for volume \"kubelet-dir\" (UniqueName: \"kubernetes.io/host-path/ab6fecf1-fa11-40e4-83ed-448b5b26ac57-kubelet-dir\") pod \"revision-pruner-15-mno1-ctlplane-2\" (UID: \"ab6fecf1-fa11-40e4-83ed-448b5b26ac57\") " pod="openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2"
      Nov 04 14:49:09 mno1-ctlplane-2 bash[3871]: I1104 14:49:09.293077    3871 operation_generator.go:637] "MountVolume.SetUp succeeded for volume \"kubelet-dir\" (UniqueName: \"kubernetes.io/host-path/ab6fecf1-fa11-40e4-83ed-448b5b26ac57-kubelet-dir\") pod \"revision-pruner-15-mno1-ctlplane-2\" (UID: \"ab6fecf1-fa11-40e4-83ed-448b5b26ac57\") " pod="openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2"
      
      Nov 04 14:49:10 mno1-ctlplane-2 bash[3800]: time="2025-11-04 14:49:10.199767594Z" level=error msg="Failed to cleanup (probably retrying): failed to destroy network for pod sandbox k8s_revision-pruner-15-mno1-ctlplane-2_openshift-kube-apiserver_e3b0b201-e143-4a49-9770-d68528d32230_0(b407b68e306745e09d590d70ae93b5c3228e54dedf7cb8973d1238e75d42a35a): no CNI configuration file in /etc/kubernetes/cni/net.d/. Has your network provider started?: stat netns path \"\": stat : no such file or directory"
      Nov 04 14:49:10 mno1-ctlplane-2 bash[3871]: I1104 14:49:10.376966    3871 util.go:30] "No sandbox for pod can be found. Need to start a new one" pod="openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2"
      
      Nov 04 14:49:24 mno1-ctlplane-2 bash[3800]: time="2025-11-04 14:49:24.302631882Z" level=info msg="Got pod network &{Name:revision-pruner-15-mno1-ctlplane-2 Namespace:openshift-kube-apiserver ID:0980ddd6bc3fa1b78388af71e76ba1b3b10cf378ca308fd30ae80e120b01c216 UID:ab6fecf1-fa11-40e4-83ed-448b5b26ac57 NetNS: Networks:[{Name:multus-cni-network Ifname:eth0}] RuntimeConfig:map[multus-cni-network:{IP: MAC: PortMappings:[] Bandwidth:<nil> IpRanges:[] CgroupPath:kubepods-podab6fecf1_fa11_40e4_83ed_448b5b26ac57.slice PodAnnotations:0xc0008e6428}] Aliases:map[]}"
      
      Nov 04 14:49:24 mno1-ctlplane-2 bash[3800]: time="2025-11-04 14:49:24.302873415Z" level=info msg="Deleting pod openshift-kube-apiserver_revision-pruner-15-mno1-ctlplane-2 from CNI network \"multus-cni-network\" (type=multus-shim)"
      
      Nov 04 14:49:24 mno1-ctlplane-2 bash[3800]: 2025-11-04T14:49:24Z [error] CmdDel (shim): CNI request failed with status 400: 'ContainerID:"0980ddd6bc3fa1b78388af71e76ba1b3b10cf378ca308fd30ae80e120b01c216" Netns:"" IfName:"eth0" Args:"IgnoreUnknown=1;K8S_POD_NAMESPACE=openshift-kube-apiserver;K8S_POD_NAME=revision-pruner-15-mno1-ctlplane-2;K8S_POD_INFRA_CONTAINER_ID=0980ddd6bc3fa1b78388af71e76ba1b3b10cf378ca308fd30ae80e120b01c216;K8S_POD_UID=ab6fecf1-fa11-40e4-83ed-448b5b26ac57" Path:"" ERRORED: DelegateDel: error invoking DelegateDel - "ovn-k8s-cni-overlay": error in getting result from DelNetwork: CNI request failed with status 400: '[openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2 0980ddd6bc3fa1b78388af71e76ba1b3b10cf378ca308fd30ae80e120b01c216 network default NAD default] [openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2 0980ddd6bc3fa1b78388af71e76ba1b3b10cf378ca308fd30ae80e120b01c216 network default NAD default] failed to get container namespace for pod openshift-kube-apiserver/revision-pruner-15-mno1-ctlplane-2 NAD default: failed to Statfs "": no such file or directory
      

      Expected results:

      Pods should be queued for reconciliation after the node becomes Ready and the operator should monitor pruner pod status and clean up or recreate failed instances instead of leaving them in stuck states indefinitely.
      

      Attached Files:

              rphillip@redhat.com Ryan Phillips
              rhn-support-jclaretm Jorge Claret Membrado
              None
              None
              Min Li Min Li
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: