-
Task
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
-
5
-
False
-
None
-
False
-
No
-
MK - Sprint 218
WHAT
The dataplane SLO dashboard is currently using the arithmetric mean to aggregate the availability over all Kafka instances.
Update the dashboard to fix the skewed behaviour issue to https://issues.redhat.com/browse/MGDSTRM-7999
WHY
The arithmetic mean is meaningless. Using it means that the Error Budget Policy (EBP) won't kick in even though some number (>0, possibly > 1) of instances have run out of error budget. That can have knock-on effects for other services that depend on RHOSAK. E.g. if one of those Kafka instances was being used for SmartEvents then the EBP not kicking in would mean RHOSAK could carry on reducing Error Budget still further unaware that SmartEvents' SLO was now being broken.
HOW
To fix this we need to change the query to use the minimum of the error budget over all the instances. I.e. We report the worst error budget over all the instance over the last 28 days.
This will ensure that the Error Budget Policy kicks in once any instance has run out of Error Budget.
Move the individual kafka panel to be the first graph you see out of the two
DONE
Include the following where applicable:
- <bulleted list of functional acceptance criteria that need to be completed>
- <call out anything on the documentation side that's needed as a result of this task being completed>
- <any metrics, monitoring dashboards and alerts that need to be created or be updated>
- <SOP creation or updates>
Guidelines
The following steps should be adhered to:
- Required tests should be put in place - unit, integration, manual test cases (if necessary)
- CI and all relevant tests passing
- Changes have been verified by one additional reviewer against:
- each required environment
- each supported upgrade path
- If the changes could have an impact on the clients (either UI or CLI), a JIRA should be created for making the required changes on the client side and acknowledged by one of the client side team members. PR has been merged