Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-9071

pods are stuck in "ContainerCreating" status, while hyperkube show "Timed out while waiting for systemd to remove kubepods-podxxx"

    • Quality / Stability / Reliability
    • None
    • None
    • None
    • Moderate
    • None
    • Unspecified
    • None
    • None
    • Rejected
    • None
    • None
    • If docs needed, set a value
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      occasionally many pods in a worker node got stuck in 'ContainerCreating' status on OCP 4.9.1.
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      [root@bastion1 ~]# oc get po -n oam -owide
      NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
      acpf11067-cip1-8486656df9-mhzns 1/1 Running 0 6h11m fd01:0:0:6::59 worker03.ss2.host.local <none> <none>
      aupf11067-cmp1-7548d577c-qqznc 0/1 ContainerCreating 0 5h59m <none> worker02.ss2.host.local <none> <none>
      aupf11067-dmp0-856dbcbf9-d6j8d 0/1 ContainerCreating 0 5h19m <none>
      ..
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      When this happens, the node become unstable, being unable to collect sosreport somtimes.
      It happened on a node, and later on another node.
      Rebooting the node can solve the issue.

      From journel, you see lots of the following error when the issue happens
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
      Dec 10 01:17:16 worker02.ss2.host.local hyperkube[4111]: I1210 01:17:16.011373 4111 pod_container_manager_linux.go:194] "Failed to delete cgroup paths" cgroupName=[kubepods pod39dcbeba-82ee-42b4-ae41-39dc7cdbad98] err="unable to destroy cgroup paths for cgroup [kubepods pod39dcbeba-82ee-42b4-ae41-39dc7cdbad98] : Timed out while waiting for systemd to remove kubepods-pod39dcbeba_82ee_42b4_ae41_39dc7cdbad98.slice"
      ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

      It seems there are lots of kubelet process running at the time of occurrence.
      $ grep kubelet ./sos_commands/process/ps_-elfL | grep 4111 | wc -l
      7011

      Version-Release number of selected component (if applicable):

      OCP Service Version: 4.9.1
      Kubernetes Version: v1.22.0-rc.0+ef241fd

      How reproducible:

      Currently we do not know the condition to reproduce.

      Actual results:
      Pod should be created and be Ready.

      Expected results:

      Additional info:

              kiran@redhat.com kiran@redhat.com (Inactive)
              rhn-support-jseunghw Hwanii Seung Hwan Jung
              Sunil Choudhary Sunil Choudhary
              Red Hat Employee
              None
              Votes:
              0 Vote for this issue
              Watchers:
              15 Start watching this issue

                Created:
                Updated:
                Resolved: