Loading...

XML

Word

Printable

Type: Bug
Resolution: Cannot Reproduce
Priority: Normal
Fix Version/s: None
Affects Version/s: 4.9
Component/s: Node / Kubelet
Labels:
- migrated_from_bz
- needs_manual_sfdc

Activity Type:
Quality / Stability / Reliability
Blocked:
None
Blocked Reason:
None
Story Points:
None
Severity:
Moderate
Regression:
None
Architecture:

Unspecified

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Rejected
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
None
Release Note Type:
If docs needed, set a value
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

occasionally many pods in a worker node got stuck in 'ContainerCreating' status on OCP 4.9.1.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[root@bastion1 ~]# oc get po -n oam -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
acpf11067-cip1-8486656df9-mhzns 1/1 Running 0 6h11m fd01:0:0:6::59 worker03.ss2.host.local <none> <none>
aupf11067-cmp1-7548d577c-qqznc 0/1 ContainerCreating 0 5h59m <none> worker02.ss2.host.local <none> <none>
aupf11067-dmp0-856dbcbf9-d6j8d 0/1 ContainerCreating 0 5h19m <none>
..
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
When this happens, the node become unstable, being unable to collect sosreport somtimes.
It happened on a node, and later on another node.
Rebooting the node can solve the issue.

From journel, you see lots of the following error when the issue happens
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Dec 10 01:17:16 worker02.ss2.host.local hyperkube[4111]: I1210 01:17:16.011373 4111 pod_container_manager_linux.go:194] "Failed to delete cgroup paths" cgroupName=[kubepods pod39dcbeba-82ee-42b4-ae41-39dc7cdbad98] err="unable to destroy cgroup paths for cgroup [kubepods pod39dcbeba-82ee-42b4-ae41-39dc7cdbad98] : Timed out while waiting for systemd to remove kubepods-pod39dcbeba_82ee_42b4_ae41_39dc7cdbad98.slice"
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

It seems there are lots of kubelet process running at the time of occurrence.
$ grep kubelet ./sos_commands/process/ps_-elfL | grep 4111 | wc -l
7011

Version-Release number of selected component (if applicable):

OCP Service Version: 4.9.1
Kubernetes Version: v1.22.0-rc.0+ef241fd

How reproducible:

Currently we do not know the condition to reproduce.

Actual results:
Pod should be created and be Ready.

Expected results:

Additional info:

Assignee:: kiran@redhat.com (Inactive)

Reporter:: Hwanii Seung Hwan Jung

QA Contact:: Sunil Choudhary

Contributing Groups:: Red Hat Employee

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 15 Start watching this issue

Created:: 2022/01/14 9:01 AM

Updated:: 2025/09/13 5:21 PM

Resolved:: 2023/03/17 2:34 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates