Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-1986

After multiple soft reboots of a SNO cluster some of the user workload containers do not start: probes report exec failed: unable to start container process: error adding pid to cgroups: failed to write: open ../cgroup.procs: no such file or directory

    XMLWordPrintable

Details

    • False
    • Hide

      None

      Show
      None

    Description

      Description of problem:

      After multiple soft reboots of a SNO clusters some of the user workload statefulset pods containers do not start: probes report exec failed: unable to start container process: error adding pid 876035 to cgroups: failed to write 876035: open /sys/fs/cgroup/systemd/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod0006f2bb_8ec7_4d23_b3ee_f41f2139099a.slice/crio-3423eeabbeb52cba4c7eaa4c91fafe593a5453c48b096e154a0edacbdfa133c8.scope/cgroup.procs: no such file or directory 

      Version-Release number of selected component (if applicable):

      4.11.7

      How reproducible:

      Infrequent

      Steps to Reproduce:

      1. Deploy SNO cluster with Telco DU profile applied
      2. Create user workload 
      3. Trigger a soft reboot via `reboot` command
      4. Wait for the node to recover
      5. Validate all the workload resource recovered correctly

      Actual results:

      One of the statefulset's pods containers does not start

      Expected results:

      All workload resources recover successfully

      Additional info:

      Attaching must-gather and sosreport and the output of `oc describe/get pods`
      
      After deleting/re-creating the pod all containers start successfully.

      Attachments

        1. pod.yaml
          11 kB
        2. pod-describe.txt
          55 kB
        3. statefulset.yaml
          7 kB

        Activity

          People

            Unassigned Unassigned
            mcornea@redhat.com Marius Cornea
            Sunil Choudhary Sunil Choudhary
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: