Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-11318

Pods get stuck in "ContainerCreating" status after deleting/re-creating test workloads multiple times with crun in telco scale tests

    XMLWordPrintable

Details

    • Critical
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • 4/4: telco reviewed

    Description

      Description of problem:

      Pods get stuck in "ContainerCreating" status after deleting/re-creating test workloads multiple times. 
      
      kubelet journal continuously reports errors like:
      
      "Failed to delete cgroup paths" cgroupName=[kubepods besteffort pod09daef92-689c-4390-9790-9b327190eb61] err="unable to destroy cgroup paths for cgroup [kubepods besteffort pod09daef92-689c-4390-9790-9b327190eb61] : Timed out while waiting for systemd to remove kubepods-besteffort-pod09daef92_689c_4390_9790_9b327190eb61.slice" 

      Version-Release number of selected component (if applicable):

      4.13.0-rc.2

      How reproducible:

      100%

      Steps to Reproduce:

      1. Deploy SNO with Telco DU profile applied
      
      2. Enable crun as runtime environment:
      ---
      apiVersion: machineconfiguration.openshift.io/v1
      kind: MachineConfig
      metadata:
        labels:
          machineconfiguration.openshift.io/role: master
        name: zzz-pao-runtime-override
      spec:
        config:
          ignition:
            version: 3.2.0
          storage:
            files:
              - contents:
                  source: data:text/plain;charset=utf-8;base64,W2NyaW8ucnVudGltZS5ydW50aW1lcy5oaWdoLXBlcmZvcm1hbmNlXQpydW50aW1lX3BhdGggPSAiL2Jpbi9jcnVuIgpydW50aW1lX3R5cGUgPSAib2NpIgpydW50aW1lX3Jvb3QgPSAiL3J1bi9jcnVuIgphbGxvd2VkX2Fubm90YXRpb25zID0gWyJjcHUtbG9hZC1iYWxhbmNpbmcuY3Jpby5pbyIsICJjcHUtcXVvdGEuY3Jpby5pbyIsICJpcnEtbG9hZC1iYWxhbmNpbmcuY3Jpby5pbyIsICJjcHUtYy1zdGF0ZXMuY3Jpby5pbyIsICJjcHUtZnJlcS1nb3Zlcm5vci5jcmlvLmlvIl0=
                mode: 420
                path: /etc/crio/crio.conf.d/99-z00-runtimes.conf
                user: {}
      ---
      apiVersion: machineconfiguration.openshift.io/v1
      kind: ContainerRuntimeConfig
      metadata:
       name: enable-crun
      spec:
       machineConfigPoolSelector:
         matchLabels:
           pools.operator.machineconfiguration.openshift.io/master: ""
       containerRuntimeConfig:
         defaultRuntime: crun 
      
       3. Delete and re-create the test workload multiple times 
      

      Actual results:

      After ~7 delete/re-create iterations the test workload pods get stuck in ContainerCreating state.

      Expected results:

      Test workload can be re-created multiple times without issues.

      Additional info:

      Attaching must-gather and sosreport.

      Attachments

        Activity

          People

            jmencak Jiri Mencak
            mcornea@redhat.com Marius Cornea
            Sunil Choudhary Sunil Choudhary
            Peter Hunt, Ryan Phillips
            Votes:
            0 Vote for this issue
            Watchers:
            16 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: