Loading...

Linking RHIVOS CVEs to...

Migration: Automation ...

Sync from "Extern...

XML

Word

Printable

Type: Bug
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: crun
Labels:
None

Regression:
None
Severity:
Critical

AssignedTeam:
rhel-container-tools

Story Points:
3
Blocked:
False
Ready:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Sprint:
None

Preliminary Testing:
None
Test Coverage:
None

Experience:
RH Private Keywords:

PX Impact Score:
PX Priority Data:
SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

crun won't reparent and reap zombie process because ptm2 buffer is full

Our customer requested pod deletion, but the pod is stuck in terminating. Upon inspection of the node logs, we can see that the container will never shut down, triggering known crio bug `level=warning msg="Stopping container (...) with stop signal timed out. Killing"` https://issues.redhat.com/browse/OCPBUGS-28981. OCPBUGS-28981 just makes the issue more apparent, but is more than likely not related to it. We then look at the crio container, and can see that it's not being shut down. We can see that the thread group leader (`rsyslogd` in this specific case) with PID 1 received a kill -9 and is trying to shut down as hard as it can, but it's stuck in `zap_pid_ns_processes`. The apparent reason is that we see a bash process in `ZO` (zombie) state which belongs to `crun`. However, `crun` never reaps the zombie process as it's stuck sleeping, waiting to be able to write to tty `ptm2`. It cannot do so, because the tty is full. The tty belongs to `crio`.

This is the same as upstream issue https://github.com/cri-o/cri-o/issues/6699, see https://github.com/cri-o/cri-o/issues/6699#issuecomment-1452796427

We also have a cri-o bug for this https://issues.redhat.com/browse/OCPBUGS-31317 as it's not sure which component should handle this

links to

cri-o/cri-o/issues#6699: Containers can get stuck in stopping state until cri-o is restarted

Assignee:: Kirill Kolyshkin

Reporter:: Andreas Karis

Developer:: Container Runtime Eng Bot

QA Contact:: Container Runtime Bugs Bot

Votes:: 0 Vote for this issue

Watchers:: 11 Start watching this issue

Created:: 2024/03/22 8:41 AM

Updated:: 2024/09/23 4:44 PM

Resolved:: 2024/04/12 11:44 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates