-
Epic
-
Resolution: Unresolved
-
Major
-
None
-
None
-
Pipeline Failure Analysis LLM Based
-
False
-
-
False
-
-
In Progress
-
SRVKP-8518 - Pipeline failure analysis
-
41% To Do, 9% In Progress, 50% Done
-
-
-
Ranked Issues
Epic Goal
Deliver a productized Tekton Assist service that leverages LLM-based analysis to explain failed Pipelines/TaskRuns and guide remediation. The service will be deployable via Helm, integrated with the Tekton CLI (tkn), and supported under the openshift-pipelines org.
Why is this important?
- today, diagnosing failed Pipelines/TaskRuns requires deep Tekton expertise and manual inspection of logs.
- Tekton Assist reduces time-to-triage by providing actionable root cause analysis and remediation guidance directly to users.
- Improves developer experience and operator efficiency by integrating into existing Tekton tools (tkn, Helm).
- Aligns with OpenShift Pipelines’ goal of delivering AI-powered developer productivity features.
Scenarios
- A developer runs a Pipeline that fails due to missing secrets.
-
tkn taskrun explain my-run outputs:
❌ Step 'build' failed due to missing secret 'docker-creds' Suggested fix: Create secret 'docker-creds' in namespace 'ci-tools'
- An operator installs Tekton Assist via Helm with an OpenAI key and custom model settings.
- The service deploys successfully and is reachable at a cluster endpoint.
- A TaskRun fails due to an invalid image reference.
- Tekton Assist provides root cause: "Image 'quay.io/foo/bar:latest' not found"
-
- Suggests checking registry permissions and image existence.
Acceptance Criteria (Mandatory)
- ✅ CI MUST be running successfully with tests automated for Tekton Assist service and CLI integration.
- ✅ Helm chart available for Tekton Assist with documented configuration options.
- ✅ tkn pipelinerun explain and tkn taskrun explain produce structured, human-readable analysis.
- ✅ Release Technical Enablement: Provide necessary docs (install guide, CLI usage, troubleshooting).
- ✅ Tekton Assist service productized in openshift-pipelines org with Konflux CI integration.
Dependencies (internal and external)
- ...
Previous Work (Optional):
- …
Open questions::
- …
Done Checklist
- Acceptance criteria are met
- Non-functional properties of the Feature have been validated (such as performance, resource, UX, security or privacy aspects)
- User Journey automation is delivered
- Support and SRE teams are provided with enough skills to support the feature in production environment
- is blocked by
-
SRVKP-8790 Testing for the epic
-
- To Do
-
- is documented by
-
RHDEVDOCS-6794 Document the new LLM Based Pipeline Failure Analysis
-
- Open
-