Uploaded image for project: 'OpenShift Pipelines'
  1. OpenShift Pipelines
  2. SRVKP-7692

Observability for the new Event Based Tekton Pruner

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Done
    • Icon: Major Major
    • Pipelines 1.20.0
    • None
    • Pruner
    • None
    • 3
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide
      Includes the following:
      - Fixes transient cleanup failures after ConfigMap reloads. The controller now skips missing, deleted, or already processed PipelineRun/TaskRun resources instead of aborting cleanup, improving reliability.
      - Includes a new service `tekton-pruner-controller` that exposes the metrics on port 9090.
      - Bumped knative package to latest version resulting in name change of the standard knative controller metrics. Any PromQL based on the metrics might need an update.
      - Introduces metrics explicitly for tektoncd-pruner-controller across 4 categories.
      - Ensures Native OpenTelemetry integration with Knative. Both the knative controller metrics and the Pruner functional metrics are exposed in same port 9090.
      - Rich labeling strategy for detailed observability
      - works out-of-the-box without additional configuartion
      - Complete backward compatibility - no breaking changes
      Show
      Includes the following: - Fixes transient cleanup failures after ConfigMap reloads. The controller now skips missing, deleted, or already processed PipelineRun/TaskRun resources instead of aborting cleanup, improving reliability. - Includes a new service `tekton-pruner-controller` that exposes the metrics on port 9090. - Bumped knative package to latest version resulting in name change of the standard knative controller metrics. Any PromQL based on the metrics might need an update. - Introduces metrics explicitly for tektoncd-pruner-controller across 4 categories. - Ensures Native OpenTelemetry integration with Knative. Both the knative controller metrics and the Pruner functional metrics are exposed in same port 9090. - Rich labeling strategy for detailed observability - works out-of-the-box without additional configuartion - Complete backward compatibility - no breaking changes
    • Feature
    • Proposed
    • Pipelines Sprint Pioneers 34, Pipelines Sprint Pioneers 35

      Story (Required)

      Add telemetry to expose the number of PipelineRuns and TaskRuns pruned by the Tekton pruner.

      Purpose:
      Provide visibility into pruning operations for monitoring, alerting, and debugging. This helps verify that the pruner is functioning correctly and cleaning up resources as expected.

      Value:

      • Enables tracking of pruner activity over time
      • Helps detect failures or misconfigurations in the pruning process
      • Supports capacity planning and system observability
      • Improves trust and transparency in automated cleanup

      Expected Metrics (example names):

      • tekton_pruner_pipelineruns_pruned_total
      • tekton_pruner_taskruns_pruned_total

      These metrics should be compatible with Prometheus and visible via standard dashboards and alerting tools.

      Background (Required)

      https://redhat-internal.slack.com/archives/CG5GV6CJD/p1748446030638729?thread_ts=1748333999.376629&cid=CG5GV6CJD

      Out of scope

      <Defines what is not included in this story>

      Approach (Required)

      <Description of the general technical path on how to achieve the goal of the story. Include details like json schema, class definitions>

      Dependencies

      <Describes what this story depends on. Dependent Stories and EPICs should be linked to the story.>

      Acceptance Criteria (Mandatory)

      <Describe edge cases to consider when implementing the story and defining tests>

      <Provides a required and minimum list of acceptance tests for this story. More is expected as the engineer implements this story>

      INVEST Checklist

      Dependencies identified

      Blockers noted and expected delivery timelines set

      Design is implementable

      Acceptance criteria agreed upon

      Story estimated

      Legend

      Unknown

      Verified

      Unsatisfied

      Done Checklist

      • Code is completed, reviewed, documented and checked in
      • Unit and integration test automation have been delivered and running cleanly in continuous integration/staging/canary environment
      • Continuous Delivery pipeline(s) is able to proceed with new code included
      • Customer facing documentation, API docs etc. are produced/updated, reviewed and published
      • Acceptance criteria are met

        1. old_knative_metrics.txt
          64 kB
          Anitha Natarajan
        2. new_knative_metrics.txt
          69 kB
          Anitha Natarajan

              rh-ee-anataraj Anitha Natarajan
              rh-ee-anataraj Anitha Natarajan
              Sai Raju Manthina Sai Raju Manthina
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: