Uploaded image for project: 'OpenShift Pipelines'
  1. OpenShift Pipelines
  2. SRVKP-10420

[Blog/doc] Debugging Failed Tekton Pipelines on OpenShift Using OpenShift Lightspeed (OLS) Chat

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • None

      Story (Required)

      As a Platform Engineer / DevOps Engineer
      trying to debug and resolve failed Tekton PipelineRuns on OpenShift
      I want to use OpenShift Lightspeed (OLS) Chat to quickly identify root causes and apply fixes
      so that I can reduce pipeline downtime, improve developer productivity, and standardize troubleshooting workflows.

      This story focuses on helping OpenShift users understand how and why Tekton pipelines fail and how OpenShift Lightspeed Chat can guide them interactively through debugging and resolution. It improves the customer experience by reducing mean time to resolution (MTTR), lowering the learning curve for Tekton, and providing AI-assisted, context-aware troubleshooting directly within the OpenShift console.

       

      Background (Required)

      Tekton Pipelines are a core component of OpenShift Pipelines, enabling Kubernetes-native CI/CD workflows. While powerful, debugging failed PipelineRuns can be challenging due to:

      • Complex task dependencies
      • Resource constraints (CPU/memory limits)
      • Network and registry authentication issues
      • Sparse or verbose logs

      OpenShift Pipelines are installed and managed via the OpenShift Pipelines Operator, and failures typically require deep inspection of PipelineRun, TaskRun, pod logs, and cluster configuration.

      OpenShift Lightspeed (OLS) Chat enhances this experience by allowing users to ask natural language questions (e.g., “Why did my pipeline fail?”) and receive contextual guidance based on cluster state, logs, and best practices.

      This blog walks through three real-world PipelineRun failure scenarios and demonstrates how OLS Chat can be used to debug and fix them.

      Out of scope

      • Deep Tekton authoring concepts (custom controllers, Tekton internals)
      • Performance benchmarking of pipelines
      • Non-OpenShift Kubernetes distributions
      • Security hardening beyond registry authentication basics

      Approach (Required)

      1. Install OpenShift Pipelines Operator

      Install OpenShift Pipelines from OperatorHub:

      • Navigate to OperatorHub
      • Search for OpenShift Pipelines
      • Install in the openshift-operators namespace

      Verify installation:
      oc get pods -n openshift-pipelines

      2. Scenario 1: pipelinerun-go.yaml – Task Failure Due to Build Issue

      Problem
      A Go build task fails due to a missing go.mod file.

      Symptoms

      • PipelineRun status: Failed
      • TaskRun logs show:
         
         
        go: cannot find main module

      Using OLS Chat

      1. Open the failed PipelineRun in the OpenShift Console
      1. Launch OpenShift Lightspeed Chat
      1. Ask:

        “Why did my Tekton PipelineRun fail?”

      OLS Chat Response (Example)
      OLS identifies the failing task, analyzes logs, and suggests:

      • Ensuring the repository includes go.mod
      • Adding a workingDir or correct source workspace binding

      Fix
      Update the task to validate source structure or add a pre-check task.

      3. Scenario 2: pipelinerun-limit.yaml – Memory-Intensive Task Failure

      Problem
      A task fails with OOMKilled due to insufficient memory.

      Symptoms

      • Pod status: OOMKilled
      • Event message:
         
         
        Container terminated due to memory limit

      Using OLS Chat
      Ask:

      “This pipeline failed with OOMKilled. How do I fix it?”

      OLS Chat Guidance

      • Detects memory pressure
      • Recommends increasing resources.limits.memory
      • Suggests profiling or splitting tasks

      Fix
      Update the task definition:
      resources:
      limits:
      memory: "2Gi"
      requests:
      memory: "1Gi"
       
       
      4. Scenario 3: pipelinerun-network.yaml – Registry Push Failure

      Problem
      Pipeline fails while pushing an image to a container registry.

      Possible Causes

      • Network egress blocked
      • Invalid or missing registry credentials
      • Incorrect image URL

      Symptoms
       
       
      denied: requested access to the resource is denied
      Using OLS Chat
      Ask:

      “Why can’t my pipeline push the image to the registry?”

      OLS Chat Analysis

      • Detects authentication error
      • Suggests checking:
        • ImagePullSecrets
        • ServiceAccount bindings
        • NetworkPolicy / proxy settings

      Fix

      • Attach correct secret to the pipeline ServiceAccount:

      oc secrets link pipeline registry-secret
       
       
       
       

      Dependencies

      • OpenShift Cluster (4.x)
      • OpenShift Pipelines Operator installed
      • Tekton Pipelines and Triggers APIs
      • OpenShift Lightspeed enabled and configured
      • Access to container registry (internal or external)

      Acceptance Criteria (Mandatory)

      • User can install OpenShift Pipelines Operator successfully
      • User can run and observe failed PipelineRuns
      • OLS Chat correctly identifies:
        • Task-level failures
        • Resource constraint issues
        • Network/authentication errors
      • Suggested fixes are actionable and correct
      • Screenshots or UI references clearly demonstrate OLS usage

      Edge Cases

      • Multiple tasks failing simultaneously
      • Intermittent network failures
      • Misleading log messages

      INVEST Checklist

      Dependencies identified

      Blockers noted and expected delivery timelines set

      Design is implementable

      Acceptance criteria agreed upon

      Story estimated

      Legend

      Unknown

      Verified

      Unsatisfied

      Done Checklist

      • Code is completed, reviewed, documented and checked in
      • Unit and integration test automation have been delivered and running cleanly in continuous integration/staging/canary environment
      • Continuous Delivery pipeline(s) is able to proceed with new code included
      • Customer facing documentation, API docs etc. are produced/updated, reviewed and published
      • Acceptance criteria are met

        1. pipelinerun-network.yaml
          1.0 kB
        2. pipelinerun-limit.yaml
          0.8 kB
        3. pipelinerun-go-golangci-lint.yaml
          6 kB
        4. pipelinerun-go-git-clone.yaml
          13 kB
        5. pipelinerun-go.yaml
          0.6 kB
        6. pipeline-go.yaml
          3 kB

              Unassigned Unassigned
              jkhelil abdeljawed khelil
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: