Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: RHODS_1.23.0_GA
Component/s: Monitoring
Labels:
- eng
- groomed
- mt-sre

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Acceptance Criteria:
None
Affects Testing:

Testable
Automated:
No
CDW blocker:
CDW devel_ack:
CDW docs_ack:
CDW pm_ack:
CDW qa_ack:
CDW release:
Regression:
No
Target Release:

FUTURE_GA
Test Blocker:
No
Test Coverage:

Pending
Watchlist Impact:
None
Intelligence Requested:
Market:
PX Impact Score:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

Pagerduty alerts received by SRE team does not include the cluster_id in the alert. It does not include any details about the cluster which the alert was triggered for.
Following prometheus configuration https://github.com/red-hat-data-services/odh-deployer/blob/main/monitoring/prometheus/prometheus-configs.yaml needs to be update.

Following is a current alert received by SRE:

Labels:
 - alertname = RHODS Dashboard Probe Success Burn Rate
 - name = rhods-dashboard
 - severity = critical
Annotations:
 - message = High error budget burn for  (current value: 0.09999999999999998).
 - summary = RHODS Dashboard Probe Success Burn Rate
 - triage = https://gitlab.cee.redhat.com/service/managed-tenants-sops/-/blob/main/RHODS/Jupyter/rhods-dashboard-probe-success-burn-rate.md
Source: http://prometheus-6f855b778d-jk4sk:9090/graph?..............................................................................

Note that in the above alert SOP link is broken. All the SOP links in all the alerts has to be checked to make sure they are not broken.

Acceptance Criteria:

Include the details about the cluster in the alerts(specially cluster_id) so that the SRE members can uniquely identify the cluster.
For all the alerts, SOP are not broken.

mentioned on

Merge request - [RHODS-7489] fix: add symlink to prevent 404's when redirected from PD alerts

Assignee:: Max Gautier (Inactive)

Reporter:: Chamal Abeywardhana

QA Contact:: Jorge Garcia Oncins

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Created:: 2023/03/09 2:16 AM

Updated:: 2025/06/11 11:42 PM

Details

Description

Description of problem:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates