-
Epic
-
Resolution: Done
-
Normal
-
None
-
None
-
None
-
None
-
CI for Prometheus rules
-
False
-
None
-
False
-
Not Selected
-
NEW
-
To Do
-
MON-3159Technical Debt
-
NEW
-
0% To Do, 0% In Progress, 100% Done
note: just a placeholder for now
It already happened that operators had configured Prometheus rules which aren't valid:
- The query expression uses the wrong function given the metric type (for instance,increase() with a gauge like in https://bugzilla.redhat.com/show_bug.cgi?id=2097073).
- The query expression references metric names that don't exist (example https://bugzilla.redhat.com/show_bug.cgi?id=2015418]).
- The alert definition doesn't follow the alerting consistency guidelines (https://github.com/openshift/enhancements/blob/master/enhancements/monitoring/alerting-consistency.md).
While we can't catch everything, it should be feasible to check for most common mistakes with the CI.