-
Task
-
Resolution: Done
-
Undefined
-
None
-
1
-
False
-
None
-
False
-
KONFLUX-123 - Konflux Availability SLO phase 1
-
Release Note Not Required
-
-
-
Pipelines Sprint Crookshank 15
Current state of alerts/panels in Konflux SRE
1) It looks like based on the PR https://github.com/redhat-appstudio/o11y/pull/364 the Konflux Monitoring team has promoted all the Pipeline alerts and panels defined in the o11y repository to Konflux SRE monitoring prod per the process defined at https://github.com/redhat-appstudio/o11y?tab=readme-ov-file#updating-alert-and-recording-rules
2) Konflux Monitoring mftboy has signed off on the one remaining MR that add the associated SOP for Results High Error rate, https://gitlab.cee.redhat.com/konflux/docs/sop/-/merge_requests/193
3) However we need signoff from someone from Konflux SRE for it to get merged. See for the details. I will ping them in our #wg-sre-konflux slack thread https://redhat-internal.slack.com/archives/C04TL3Q3C4B/p1726251504848259 one more time. That will be my last attempt and getting them to review, ask for changes.
Next steps:
1) With the merging of , any references to `https://gitlab.cee.redhat.com/konflux/docs/sop/-/blob/main/pipeline-service/slos/tekton-pipeline-related-deadlocks.md` in https://github.com/redhat-appstudio/o11y/blob/main/rhobs/alerting/data_plane/prometheus.pipeline_alerts.yaml need to be changed to `https://gitlab.cee.redhat.com/konflux/docs/sop/-/blob/main/pipeline-service/sre/tekton-pipeline-related-deadlocks.md`. diagrawa can create the https://github.com/redhat-appstudio/o11y PR and get mftboy to merge
2) If https://gitlab.cee.redhat.com/konflux/docs/sop/-/merge_requests/193 gets reviewed between now and Oct 15
- I will attempt to make the requisite changes and get it merged.
If https://gitlab.cee.redhat.com/konflux/docs/sop/-/merge_requests/193 does not get reviewed by Oct 15
- diagrawa and the OSP team ca decide if they want diagrawa to create a copy of my MR and take over trying to get it merged.
- If yes, manage that new MR and the status of the epic accordingly
- If no. then we can close my https://gitlab.cee.redhat.com/konflux/docs/sop/-/merge_requests/193 and diagrawa can get a PR up at https://github.com/redhat-appstudio/o11y that removes the alerts and panels related to the Tekton Results API Success Rate, as there will be no SOP available in case the alert fires. Tag mftboy for review / merge. Also, diagrawa should note the queries from that alert in https://issues.redhat.com/browse/SRVKP-6473 as they are a potential implementation detail for that epic.