Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-32498

OOMKilled Build Pods Report "Error" status, not "OOMKilled"

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Minor Minor
    • 4.15.z
    • 4.15
    • Node / CRI-O
    • Low
    • Yes
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, a build pod that failed because of a memory limitation would have its pod status changed to `Error` instead of `OOMKilled`. This caused these pods to not be reported correctly. The issue only occured on cgroup v2 nodes With this release, a pod with a status of `OOMKilled` is correctly detected and reported. (link:https://issues.redhat.com/browse/OCPBUGS-32498[*OCPBUGS-32498*]
      Show
      * Previously, a build pod that failed because of a memory limitation would have its pod status changed to `Error` instead of `OOMKilled`. This caused these pods to not be reported correctly. The issue only occured on cgroup v2 nodes With this release, a pod with a status of `OOMKilled` is correctly detected and reported. (link: https://issues.redhat.com/browse/OCPBUGS-32498 [* OCPBUGS-32498 *]
    • Bug Fix
    • Done

      Description of problem:

      
      Whe build pods fail due to running out of memory, the pod status is reported "Error" instead of "OOMKilled" (or similar)
          

      Version-Release number of selected component (if applicable):

      4.15
          

      How reproducible:

      Always for builds in 4.15, not currently observed in 4.16 or earlier versions (4.14)
          

      Steps to Reproduce:

          1. Launch OpenShift cluster v4.15.z with Build capability enabled (the default)
          2. Create project/namespace
          3. Create BuildConfig that runs build with low memory request/limit. See https://raw.githubusercontent.com/openshift/origin/master/test/extended/testdata/builds/statusfail-oomkilled.yaml for an example
          4. Start the build (with example above, run `oc start-build statusfail-oomkilled
          

      Actual results:

          Build fails with generic error reason, but pod exit status 137
          

      Expected results:

          Build fails with OOMKilled reason 
          

      Additional info:

          Observed in gcp-ovn-builds test suite for 4.15.
          https://github.com/openshift/origin/pull/28725 and its backport will temporarily unblock us - these PRs should be reverted once a root cause is identified and a fix is delivered.
          

              pehunt@redhat.com Peter Hunt
              adkaplan@redhat.com Adam Kaplan
              Sunil Choudhary Sunil Choudhary
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: