-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
Story (Required)
As a Platform Engineer / DevOps Engineer
trying to debug and resolve failed Tekton PipelineRuns on OpenShift
I want to use OpenShift Lightspeed (OLS) Chat to quickly identify root causes and apply fixes
so that I can reduce pipeline downtime, improve developer productivity, and standardize troubleshooting workflows.
This story focuses on helping OpenShift users understand how and why Tekton pipelines fail and how OpenShift Lightspeed Chat can guide them interactively through debugging and resolution. It improves the customer experience by reducing mean time to resolution (MTTR), lowering the learning curve for Tekton, and providing AI-assisted, context-aware troubleshooting directly within the OpenShift console.
Background (Required)
Tekton Pipelines are a core component of OpenShift Pipelines, enabling Kubernetes-native CI/CD workflows. While powerful, debugging failed PipelineRuns can be challenging due to:
- Complex task dependencies
- Resource constraints (CPU/memory limits)
- Network and registry authentication issues
- Sparse or verbose logs
OpenShift Pipelines are installed and managed via the OpenShift Pipelines Operator, and failures typically require deep inspection of PipelineRun, TaskRun, pod logs, and cluster configuration.
OpenShift Lightspeed (OLS) Chat enhances this experience by allowing users to ask natural language questions (e.g., “Why did my pipeline fail?”) and receive contextual guidance based on cluster state, logs, and best practices.
This blog walks through three real-world PipelineRun failure scenarios and demonstrates how OLS Chat can be used to debug and fix them.
Out of scope
- Deep Tekton authoring concepts (custom controllers, Tekton internals)
- Performance benchmarking of pipelines
- Non-OpenShift Kubernetes distributions
- Security hardening beyond registry authentication basics
Approach (Required)
1. Install OpenShift Pipelines Operator
Install OpenShift Pipelines from OperatorHub:
- Navigate to OperatorHub
- Search for OpenShift Pipelines
- Install in the openshift-operators namespace
Verify installation:
oc get pods -n openshift-pipelines
2. Scenario 1: pipelinerun-go.yaml – Task Failure Due to Build Issue
Problem
A Go build task fails due to a missing go.mod file.
Symptoms
- PipelineRun status: Failed
- TaskRun logs show:
go: cannot find main module
Using OLS Chat
- Open the failed PipelineRun in the OpenShift Console
- Launch OpenShift Lightspeed Chat
- Ask:
“Why did my Tekton PipelineRun fail?”
OLS Chat Response (Example)
OLS identifies the failing task, analyzes logs, and suggests:
- Ensuring the repository includes go.mod
- Adding a workingDir or correct source workspace binding
Fix
Update the task to validate source structure or add a pre-check task.
3. Scenario 2: pipelinerun-limit.yaml – Memory-Intensive Task Failure
Problem
A task fails with OOMKilled due to insufficient memory.
Symptoms
- Pod status: OOMKilled
- Event message:
Container terminated due to memory limit
Using OLS Chat
Ask:
“This pipeline failed with OOMKilled. How do I fix it?”
OLS Chat Guidance
- Detects memory pressure
- Recommends increasing resources.limits.memory
- Suggests profiling or splitting tasks
Fix
Update the task definition:
resources:
limits:
memory: "2Gi"
requests:
memory: "1Gi"
4. Scenario 3: pipelinerun-network.yaml – Registry Push Failure
Problem
Pipeline fails while pushing an image to a container registry.
Possible Causes
- Network egress blocked
- Invalid or missing registry credentials
- Incorrect image URL
Symptoms
denied: requested access to the resource is denied
Using OLS Chat
Ask:
“Why can’t my pipeline push the image to the registry?”
OLS Chat Analysis
- Detects authentication error
- Suggests checking:
-
- ImagePullSecrets
-
- ServiceAccount bindings
-
- NetworkPolicy / proxy settings
Fix
- Attach correct secret to the pipeline ServiceAccount:
oc secrets link pipeline registry-secret
Dependencies
- OpenShift Cluster (4.x)
- OpenShift Pipelines Operator installed
- Tekton Pipelines and Triggers APIs
- OpenShift Lightspeed enabled and configured
- Access to container registry (internal or external)
Acceptance Criteria (Mandatory)
- User can install OpenShift Pipelines Operator successfully
- User can run and observe failed PipelineRuns
- OLS Chat correctly identifies:
-
- Task-level failures
-
- Resource constraint issues
-
- Network/authentication errors
- Suggested fixes are actionable and correct
- Screenshots or UI references clearly demonstrate OLS usage
Edge Cases
- Multiple tasks failing simultaneously
- Intermittent network failures
- Misleading log messages
INVEST Checklist
Dependencies identified
Blockers noted and expected delivery timelines set
Design is implementable
Acceptance criteria agreed upon
Story estimated
Legend
Unknown
Verified
Unsatisfied
Done Checklist
- Code is completed, reviewed, documented and checked in
- Unit and integration test automation have been delivered and running cleanly in continuous integration/staging/canary environment
- Continuous Delivery pipeline(s) is able to proceed with new code included
- Customer facing documentation, API docs etc. are produced/updated, reviewed and published
- Acceptance criteria are met