Uploaded image for project: 'OpenShift Pipelines'
  1. OpenShift Pipelines
  2. SRVKP-8111

Enhanced Observability for Tekton

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • Tekton Pipelines
    • Enhanced Observability for Tekton
    • False
    • Hide

      None

      Show
      None
    • False
    • To Do

      Note: This epic is created is response to the comment from rh-ee-csalinas on https://issues.redhat.com/browse/SRVKP-7692* *

      Epic Goal:

      As a Product Owner, I propose the addition of enhanced metrics and metadata collection to increase visibility and granularity into Pipeline Runs. This will improve the ability to support customers, debug issues faster, monitor feature adoption, and drive product decisions based on data.{}

      Why is this Important?

      • Improves product observability for internal and external stakeholders.
      • Enables Red Hat Support and SRE teams to better diagnose problems with detailed insights.
      • Helps Product Management track feature adoption (e.g., IPAC, TTL vs History settings).
      • Facilitates understanding of user workloads and pipeline composition for product improvements.
      • Aligns with customer feedback asking for deeper introspection into their CI/CD operations.

      Key Scenarios:

      • A pipeline fails and support needs to know what Task images were involved.
      • Product team wants to measure how many customers are using IPAC vs static YAML.
      • SRE wants to track if TTL or History-based pruning causes more cleanup failures.
      • Engineering wants to identify common upstream vs custom Task usage patterns.
      • A customer files a bug where the trigger source (manual vs webhook) is relevant to debugging.

      Proposed Metrics and Metadata to Capture:

      Category Metric/Metadata Purpose
      Pruner Configuration TTL-based vs History-based usage Adoption insight and failure correlation
      Adoption Tracking IPAC-enabled (yes/no flag per PipelineRun) Adoption of Pipelines as Code
      Pipeline Composition Task IDs used per PipelineRun Determine task source (upstream, Red Hat, custom)
      Step Execution Details Step-level image references Identify base images in use, support diagnostics
      Run Metadata Triggering source Understand usage patterns and potential automation gaps
      Run Results Execution time and status Performance benchmarking and error/success insights

       
      Acceptance Criteria (Mandatory): * CI pipelines run successfully with new metrics collected and tests automated.

      • Metric data is exposed via Prometheus endpoints and validated.
      • Metric labels follow existing conventions (e.g., namespace, pipeline_name, task_id).
      • Documentation is updated with new metric definitions and intended usage.
      • Technical enablement provided to stakeholders (Support, SRE, Docs, etc.).
      • Adoption and error trends are reviewable via dashboards or queries.

      Done Checklist:

      • Acceptance criteria are met.
      • Non-functional requirements validated (performance, security, privacy).
      • User journey automation is tested and validated.
      • Release enablement materials are created.
      • Support and SRE teams trained on interpreting and using the new data.

      Dependencies:

      • Coordination with Pipelines team to expose metadata in PipelineRun CRD
      • Prometheus integration and dashboarding (internal platform team)
      • Docs team for updating metric reference documentation
      • Support/SRE teams for enablement

      Previous Work (Optional):

      • Existing metrics in tektoncd_pruner_*
      • Previous enablement of Pipelines as Code metadata capture

      Open Questions:

      • Should step image tracking be full image reference (with tag/digest) or anonymized?
      • Do we want to track exact Task source (Hub, Red Hat, custom), or just ID?
      • Is there customer sensitivity around publishing IPAC usage metrics?
      • Will adoption dashboards be internal-only or exposed to customers?

              Unassigned Unassigned
              rh-ee-anataraj Anitha Natarajan
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: