XML

Word

Printable

Type: Story
Resolution: Done
Priority: Normal
Fix Version/s: Pipelines 5.0.5
Affects Version/s: None
Component/s: Tekton Pipelines
Labels:
- customer
- konflux

Story Points:
8
Epic Link:
SRE support: onboard AppSRE team
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Release Note Type:
Release Note Not Required
Intelligence Requested:
Market:

Sprint:
Pipelines Sprint Pioneers 7, Pipelines Sprint Pioneers 8, Pipelines Sprint Pioneers 9, Pipelines Sprint Pioneers 10

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Story (Required)

As a <PERSONA> trying to <ACTION> I want <THIS OUTCOME>

As an cluster administrator or SRE trying to maintain a cluster with openshift pipelines, I want to be able to easily visualize where stability metrics are past acceptable thresholds

<Describes high level purpose and goal for this story. Answers the questions: Who is impacted, what is it and why do we need it? How does it improve the customer’s experience?>

Background (Required)

Per the conventions for Konflux, we currently

define panels that run in the cluster being monitored, displaying the prometheus queries that serve as the basis for alerts; those are currently hosted in https://github.com/openshift-pipelines/pipeline-service ; these would be what we productize into openshift pipelines
also define panels in https://github.com/redhat-appstudio/o11y per process described at https://github.com/redhat-appstudio/o11y/?tab=readme-ov-file#grafana-dashboards ; these dashboards get mapped to the grafana system App SRE uses to monitor all the clusters under its purview
define the alerts also in https://github.com/redhat-appstudio/o11y per the process described under https://github.com/redhat-appstudio/o11y/?tab=readme-ov-file#alerting-rules based on the metrics delivered in https://issues.redhat.com/browse/SRVKP-4522 that we have monitored in prod sufficient to determine what are acceptable alert thresholds per the process ; see https://github.com/redhat-appstudio/o11y/blob/main/rhobs/alerting/data_plane/prometheus.pipeline_alerts.yaml for the existing alerts ;
presumably these alerts will be a starting point for what we would deliver in openshift pipelines, though what we deliver most likely would be optional and configurable wrt precise thresholds

<Describes the context or background related to this story>

Out of scope

<Defines what is not included in this story>

Approach (Required)

<Description of the general technical path on how to achieve the goal of the story. Include details like json schema, class definitions>

Dependencies

<Describes what this story depends on. Dependent Stories and EPICs should be linked to the story.>

Acceptance Criteria (Mandatory)

<Describe edge cases to consider when implementing the story and defining tests>

<Provides a required and minimum list of acceptance tests for this story. More is expected as the engineer implements this story>

Done Checklist

Code is completed, reviewed, documented and checked in
Unit and integration test automation have been delivered and running cleanly in continuous integration/staging/canary environment
Continuous Delivery pipeline(s) is able to proceed with new code included
Customer facing documentation, API docs etc. are produced/updated, reviewed and published
Acceptance criteria are met

blocks

SRVKP-4524 Update pipeline service SOPs in gitlab/app-interface, get tiger team/infra sign off, for core controllera deadlocked metrics, rapid restarts

Closed

is blocked by

SRVKP-4522 build metric to determine of core tekton controller is not creating pods for pipelines, determine if it is deadlocked

Closed

links to

redhat-appstudio/infra-deployments#4105: SRVKP-4523: make tekton controller restart/deadlock alert metrics available to app-sre grafana

redhat-appstudio/infra-deployments#4159: SRVKP-4523: make tekton controller k8s throttle metrics available to app-sre grafana

redhat-appstudio/infra-deployments#4242: SRVKP-4523: need both old and new tekton k8s throttle metrics available to app-sre

mentioned on

Merge request - SRVKP-4523,SRVKP-4526,SRVKP-4527,SRVKP-4528,SRVKP-6103,SRVKP-6104: remove pipeline-service slo-sli-document

Merge request - SRVKP-4523,SRVKP-4526,SRVKP-4527,SRVKP-4528: bump konflux grafana for new pipeline panels

(2 mentioned on)

Assignee:: Gabe Montero

Reporter:: Gabe Montero

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2024/03/26 5:32 PM

Updated:: 2024/09/24 12:53 PM

Resolved:: 2024/08/06 6:48 PM

Details

Description

Story (Required)

Background (Required)

Out of scope

Approach (Required)

Dependencies

Acceptance Criteria (Mandatory)

Done Checklist

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates