XML

Word

Printable

Type: Story
Resolution: Done
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: Pipelines as Code
Labels:
- customer
- konflux

Story Points:
3
Epic Link:
SRE support: onboard AppSRE team
Blocked:
False
Blocked Reason:
None
Ready:
False
Feature Link:
KONFLUX-123 - Konflux Availability SLO phase 1
Git Pull Request:
https://github.com/redhat-appstudio/o11y/pull/349
Intelligence Requested:
Market:

Sprint:
Pipelines Sprint Crookshank 11

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Story (Required)

As a <PERSONA> trying to <ACTION> I want <THIS OUTCOME>

As an cluster administrator or SRE trying to maintain a cluster with openshift pipelines, I want to be able to easily visualize where stability metrics are past acceptable thresholds

<Describes high level purpose and goal for this story. Answers the questions: Who is impacted, what is it and why do we need it? How does it improve the customer’s experience?>

Background (Required)

Per the conventions for Konflux, we currently

define panels that run in the cluster being monitored, displaying the prometheus queries that serve as the basis for alerts; those are currently hosted in https://github.com/openshift-pipelines/pipeline-service ; these would be what we productize into openshift pipelines
also define panels in https://github.com/redhat-appstudio/o11y per process described at https://github.com/redhat-appstudio/o11y/?tab=readme-ov-file#grafana-dashboards ; these dashboards get mapped to the grafana system App SRE uses to monitor all the clusters under its purview
define the alerts also in https://github.com/redhat-appstudio/o11y per the process described under https://github.com/redhat-appstudio/o11y/?tab=readme-ov-file#alerting-rules based on the metrics delivered in https://issues.redhat.com/browse/SRVKP-4522 that we have monitored in prod sufficient to determine what are acceptable alert thresholds per the process ; see https://github.com/redhat-appstudio/o11y/blob/main/rhobs/alerting/data_plane/prometheus.pipeline_alerts.yaml for the existing alerts ;
presumably these alerts will be a starting point for what we would deliver in openshift pipelines, though what we deliver most likely would be optional and configurable wrt precise thresholds

<Describes the context or background related to this story>

Out of scope

<Defines what is not included in this story>

Approach (Required)

<Description of the general technical path on how to achieve the goal of the story. Include details like json schema, class definitions>

Dependencies

<Describes what this story depends on. Dependent Stories and EPICs should be linked to the story.>

Acceptance Criteria (Mandatory)

<Describe edge cases to consider when implementing the story and defining tests>

<Provides a required and minimum list of acceptance tests for this story. More is expected as the engineer implements this story>

Done Checklist

Code is completed, reviewed, documented and checked in
Unit and integration test automation have been delivered and running cleanly in continuous integration/staging/canary environment
Continuous Delivery pipeline(s) is able to proceed with new code included
Customer facing documentation, API docs etc. are produced/updated, reviewed and published
Acceptance criteria are met

clones

SRVKP-4527 build alert, panels for tekton chains

Closed

mentioned on

Merge request - SRVKP-4523,SRVKP-4526,SRVKP-4527,SRVKP-4528,SRVKP-6103,SRVKP-6104: remove pipeline-service slo-sli-document

Merge request - SRVKP-4523,SRVKP-4526,SRVKP-4527,SRVKP-4528: bump konflux grafana for new pipeline panels

Assignee:: Divyanshu Agrawal

Reporter:: Gabe Montero

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/05/11 4:12 PM

Updated:: 2024/09/24 12:53 PM

Resolved:: 2024/08/29 6:26 PM

Details

Description

Story (Required)

Background (Required)

Out of scope

Approach (Required)

Dependencies

Acceptance Criteria (Mandatory)

Done Checklist

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates