-
Bug
-
Resolution: Done
-
Normal
-
None
-
1
-
False
-
-
False
-
TaskRuns that fail due to Out of Memory (OOM) conditions will now show the termination reason in their failure message.
-
Bug Fix
-
-
-
KONFLUX-💚Green-S283, Pipelines Sprint Tekshift 25, Pipelines Sprint Tekshift 26, Pipelines Sprint Tekshift 28, Pipelines Sprint Tekshift 29, Pipelines Sprint Tekshift 30
Description of problem:
When a Step in a Task fails due to OOM, causing the Task to fail, OOM is not included in the Task failure message
This is already something we explicitly check for, but it appears that there is a bug so the code is never reached.
Â
Since OOMKilled containers have the `"Reason": "OOMKilled"` in their Termination state, simply include the Termination Reason when extracting the container termination message
Â
As a user of Openshift Pipelines, I may have a Task which does not have appropriate memory request configuration. Currently, if a Task fails due to one of the steps' containers being OOMkilled, the Task is failed and the message "<step-name> exited with code 137". This indicates that the step container was killed by an external SIGKILL signal. In other words, a knowing user could still only ascertain that the pod was killed by the kubelet and didn't error for internal reasons. Without kubernetes access to view the pod (before it's cleaned up) or 011y access like Grafana, a user can only speculate about what caused the pod to be evicted or if it was even the pod's fault.Â
Prerequisites (if any, like setup, operators/versions):
Steps to Reproduce
{{apiVersion: tekton.dev/v1
kind: TaskRun
metadata:
 generateName: stress-test-
spec:
 computeResources:
  requests:
   memory: 64Mi
  limits:
   memory: 64Mi
 taskSpec:
  steps:
  - image: mirror.gcr.io/ubuntu
   script: |
    #!/usr/bin/env bash
    apt-get update
    apt-get -y install stress
    stress --vm 4 --vm-bytes 256M --timeout 300}}
- With Tekton running in a k8s cluster of any kind, create the above taskrun using `kubectl create
- Wait for the TaskRun to fail
- Get the status of the taksrun using `kubectl tkn desc <taskrun-name>` and using `kubectl tkn desc <taskrun-name> -o=yaml`
Â
Actual results:
The taskrun message will be ""step-unnamed-0" exited with code 137", wheras the YAML shows status.steps[0].terminated.reason: "OOMKilled"
Expected results:
The taskrun message should be something more along the lines of ""step-unnamed-0" exited with code 137: OOMkilled"
Reproducibility (Always/Intermittent/Only Once):
Acceptance criteria:Â
Â
Definition of Done:
Build Details:
Additional info (Such as Logs, Screenshots, etc):
Â
 *
- is depended on by
-
SRVKP-8141 [TRACKER] [release testing] Bug verification
-
- Closed
-