Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Critical
Fix Version/s: None
Affects Version/s: RHODS_1.1_GA
Component/s: Monitoring
Labels:
None

Blocked:
False
Ready:
False
Automated:
No
CDW devel_ack:
CDW docs_ack:
CDW pm_ack:
CDW qa_ack:
CDW release:
Regression:
No
Target Release:

RHODS_1.1_GA
Test Blocker:
No
Test Coverage:

Yes
Watchlist Impact:
None
Market:
Test Link:
https://ODS-712,_ODS-713,_ODS-738,ODS-739

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

I believe The prometheus alerts RHODS Route Error Burn Rate and RHODS Probe Success Burn Rate take can take more than 15 minutes to activate when rhods-dashboard is down.

I believe this is too long, considering that the availability should be 98%

Prerequisites (if any, like setup, operators/versions):

Steps to Reproduce

Go to RHODS-Dashboard an verify it's working properly
Log in to Prometheus
Prometheus > Alerts
Verify that alerts RHODS Route Error Burn Rate and RHODS Probe Success Burn Rate are not active
In order to provoke a disruption in the service:
Log in To OpenShift's Console
Workloads > Deployments:
Scale down rhods-operator to 0 pods
Scale down traefik-proxy to 0 pods

Refresh once RHODS-Dashboard an verify it's no longer available
Refresh Prometheus > Alerts every minute to see when the alerts are being fired
Actual results:

13:06 Scale down to 0 rhods-operator and rhods-dashboard
13:17 RHODS Probe Success Burn Rate (for 3h) alert PENDING
13:23 RHODS Probe Success Burn Rate (for 2m) alert PENDING
13:26 RHODS Probe Success Burn Rate (for 2m) alert FIRING
13:31 RHODS Route Error Burn Rate (2m) alert FIRING

See this screenshot taken at 14:28 (more than 1 hour since rhods-dashboard is down)

Refresh Prometheus > Alerts every minute to see when the alerts are being fired

Expected results:

Reproducibility (Always/Intermittent/Only Once):

Build Details:

Workaround:

Additional info:

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

image-2021-10-19-15-58-26-988.png
2021/10/19 1:58 PM
144 kB
Jorge Garcia Oncins
prometheus-alerts-13_21.png
2021/10/13 2:01 PM
250 kB
Jorge Garcia Oncins
prometheus-alerts-14_28.png
2021/10/13 12:43 PM
507 kB
Jorge Garcia Oncins

blocks

RHODS-268 Configure Prometheus alerts for RHODS SLOs

Closed

Assignee:: Unassigned

Reporter:: Jorge Garcia Oncins

QA Contact:: Jorge Garcia Oncins

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2021/10/13 12:43 PM

Updated:: 2022/11/03 8:50 AM

Resolved:: 2021/10/13 5:08 PM

Details

Description

Description of problem:

Prerequisites (if any, like setup, operators/versions):

Steps to Reproduce

Actual results:

Expected results:

Reproducibility (Always/Intermittent/Only Once):

Build Details:

Workaround:

Additional info:

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates