Uploaded image for project: 'OpenShift Pipelines'
  1. OpenShift Pipelines
  2. SRVKP-8861

PipelineRun ExplainFailure Endpoint

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • AI
    • None
    • Pipelines Sprint Crookshank 38

      Story (Required)

      As a DevOps engineer debugging a failed Tekton PipelineRun
      I want an API endpoint (/pipelinerun/explainFailure) that either lists failed TaskRuns for deeper inspection or directly analyzes the PipelineRun failure
      So that I can quickly determine if the failure is inside TaskRuns or at the PipelineRun level, and efficiently troubleshoot.

      Background (Required)

      Currently, investigating a failed PipelineRun requires:

      1. Checking the PipelineRun status.
      1. Inspecting associated TaskRuns.
      1. Drilling down into failing TaskRuns individually.

      This is manual and error-prone. The story streamlines the workflow:

      • If TaskRuns exist → return list of failed TaskRuns and prompt user to diagnose via /taskrun/explainFailure.
      • If no TaskRuns exist → analyze PipelineRun failure directly with LLM.

      This improves developer productivity and reduces time to resolution.

      Out of scope

      • Automatic diagnosis of all failed TaskRuns in a PipelineRun (only listing is included).
      • Multi-pipeline correlation.
      • Automatic retries or self-healing.

      Approach (Required)

      • Check PipelineRun status
        • Fetch PipelineRun object from Kubernetes.
        • Inspect .status.conditions.
      • Check TaskRuns
        • Query associated TaskRuns using pipelineRun=<name> label.
        • If failed TaskRuns exist → return structured list.
        • If no TaskRuns exist → analyze the PipelineRun’s status message.
      • Expose API
        • GET /pipelinerun/explainFailure?name=<pipelinerun>&namespace=<ns>API response schema
      • API response schema
      •  

      ````

       

      {
      "pipelineRun": {
      "name": "pipelinerun-go-golangci-lint",
      "namespace": "default",
      "uid": "a1b2c3d4",
      "labels": {},
      "annotations": {}
      },
      "status": {
      "phase": "Failed",
      "startTime": "2025-09-15T06:34:58Z",
      "completionTime": "2025-09-15T06:35:00Z",
      "durationSeconds": 2,
      "conditions": [
      { "type": "Succeeded", "status": "False", "reason": "CouldntGetTask", "message": "pipeline validation failed: task not found", "lastTransitionTime": "2025-09-15T06:35:00Z" }
      ]
      },
      "failedTaskRuns": [],
      "analysis": "No TaskRuns were created. PipelineRun failed during validation or scheduling.",
      },
      }
      

       

      Dependencies

      <Describes what this story depends on. Dependent Stories and EPICs should be linked to the story.>

      Acceptance Criteria (Mandatory)

      <Describe edge cases to consider when implementing the story and defining tests>

      <Provides a required and minimum list of acceptance tests for this story. More is expected as the engineer implements this story>

      INVEST Checklist

      Dependencies identified

      Blockers noted and expected delivery timelines set

      Design is implementable

      Acceptance criteria agreed upon

      Story estimated

      Legend

      Unknown

      Verified

      Unsatisfied

      Done Checklist

      • Code is completed, reviewed, documented and checked in
      • Unit and integration test automation have been delivered and running cleanly in continuous integration/staging/canary environment
      • Continuous Delivery pipeline(s) is able to proceed with new code included
      • Customer facing documentation, API docs etc. are produced/updated, reviewed and published
      • Acceptance criteria are met

              diagrawa Divyanshu Agrawal
              jkhelil abdeljawed khelil
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: