XML

Word

Printable

Type: Story
Resolution: Done
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: AI
Labels:
None

Story Points:
5
Epic Link:
Pipeline Failure Analysis LLM Based
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

Sprint:
Pipelines Sprint Crookshank 38

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Story (Required)

As a DevOps engineer debugging a failed Tekton PipelineRun
I want an API endpoint (/pipelinerun/explainFailure) that either lists failed TaskRuns for deeper inspection or directly analyzes the PipelineRun failure
So that I can quickly determine if the failure is inside TaskRuns or at the PipelineRun level, and efficiently troubleshoot.

Background (Required)

Currently, investigating a failed PipelineRun requires:

Checking the PipelineRun status.

Inspecting associated TaskRuns.

Drilling down into failing TaskRuns individually.

This is manual and error-prone. The story streamlines the workflow:

If TaskRuns exist → return list of failed TaskRuns and prompt user to diagnose via /taskrun/explainFailure.

If no TaskRuns exist → analyze PipelineRun failure directly with LLM.

This improves developer productivity and reduces time to resolution.

Out of scope

Automatic diagnosis of all failed TaskRuns in a PipelineRun (only listing is included).

Multi-pipeline correlation.

Automatic retries or self-healing.

Approach (Required)

Check PipelineRun status

- Fetch PipelineRun object from Kubernetes.

- Inspect .status.conditions.

Check TaskRuns

- Query associated TaskRuns using pipelineRun=<name> label.

- If failed TaskRuns exist → return structured list.

- If no TaskRuns exist → analyze the PipelineRun’s status message.

Expose API

- GET /pipelinerun/explainFailure?name=<pipelinerun>&namespace=<ns>API response schema
API response schema

````

{
"pipelineRun": {
"name": "pipelinerun-go-golangci-lint",
"namespace": "default",
"uid": "a1b2c3d4",
"labels": {},
"annotations": {}
},
"status": {
"phase": "Failed",
"startTime": "2025-09-15T06:34:58Z",
"completionTime": "2025-09-15T06:35:00Z",
"durationSeconds": 2,
"conditions": [
{ "type": "Succeeded", "status": "False", "reason": "CouldntGetTask", "message": "pipeline validation failed: task not found", "lastTransitionTime": "2025-09-15T06:35:00Z" }
]
},
"failedTaskRuns": [],
"analysis": "No TaskRuns were created. PipelineRun failed during validation or scheduling.",
},
}

Dependencies

<Describes what this story depends on. Dependent Stories and EPICs should be linked to the story.>

Acceptance Criteria (Mandatory)

<Describe edge cases to consider when implementing the story and defining tests>

<Provides a required and minimum list of acceptance tests for this story. More is expected as the engineer implements this story>

INVEST Checklist

Dependencies identified

Blockers noted and expected delivery timelines set

Design is implementable

Acceptance criteria agreed upon

Story estimated

Legend

Unknown

Verified

Unsatisfied

Done Checklist

Code is completed, reviewed, documented and checked in
Unit and integration test automation have been delivered and running cleanly in continuous integration/staging/canary environment
Continuous Delivery pipeline(s) is able to proceed with new code included
Customer facing documentation, API docs etc. are produced/updated, reviewed and published
Acceptance criteria are met

Assignee:: Divyanshu Agrawal

Reporter:: abdeljawed khelil

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2025/09/15 9:13 AM

Updated:: 2025/09/29 5:18 AM

Resolved:: 2025/09/29 5:18 AM

Details

Description

Story (Required)

Background (Required)

Out of scope

Approach (Required)

Dependencies

Acceptance Criteria (Mandatory)

INVEST Checklist

Legend

Done Checklist

Attachments

Easy Agile Planning Poker

Activity

People

Dates

Hide