Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-33175

[release-4.12] crun won't reap zombie process because ptm2 buffer is full [CRI-O]

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • None
    • 4.12.z
    • Node / CRI-O
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Fix a bug in CRI-O where exec sessions could survive past a container being stopped, causing zombies and leaked processes.
    • Bug Fix
    • In Progress

      crun won't reparent and reap zombie process because ptm2 buffer is full

      Our customer requested pod deletion, but the pod is stuck in terminating. Upon inspection of the node logs, we can see that the container will never shut down, triggering known crio bug `level=warning msg="Stopping container (...) with stop signal timed out. Killing"` https://issues.redhat.com/browse/OCPBUGS-28981. OCPBUGS-28981 just makes the issue more apparent, but is more than likely not related to it. We then look at the crio container, and can see that it's not being shut down. We can see that the thread group leader (`rsyslogd` in this specific case) with PID 1 received a kill -9 and is trying to shut down as hard as it can, but it's stuck in `zap_pid_ns_processes`. The apparent reason is that we see a bash process in `ZO` (zombie) state which belongs to `crun`. However, `crun` never reaps the zombie process as it's stuck sleeping, waiting to be able to write to tty `ptm2`. It cannot do so, because the tty is full. The tty belongs to `crio`.

      This is the same issue as https://github.com/cri-o/cri-o/issues/6699, see https://github.com/cri-o/cri-o/issues/6699#issuecomment-1452796427

      More in a private comment

      This is a clone of https://issues.redhat.com/browse/RHEL-30102 (the crun bug) as I don't know which component should handle this

            rh-ee-kwilczyn Krzysztof Wilczyński
            akaris@redhat.com Andreas Karis
            Sunil Choudhary Sunil Choudhary
            Krzysztof Wilczyński, Peter Hunt
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: