-
Bug
-
Resolution: Done
-
Normal
-
None
-
4.12
-
None
-
2
-
CMP Sprint 61
-
1
-
False
-
Description of problem:
The compliance_operator_compliance_scan_error_total metric has an "error" label which will have a different value every time a different error message is seen. This goes against good instrumentation practices because 1) it is impossible to predict the cardinality of the metric (e.g. how many label key/value combinations could exist at the same time for the compliance_operator_compliance_scan_error_total metric). Unbounded metrics like this can put lots of memory load on Prometheus. 2) the label value can be very long, leading to issues when users want to push the metric to other systems (see https://issues.redhat.com/browse/OBSDA-205). And in practice, alerting on the metric is likely to be complicated.
Version-Release number of selected component (if applicable):
4.12 (but probably applies to previous versions too)
How reproducible:
Always
Steps to Reproduce:
1. Trigger a scan that will fail. 2. Go to OCP console > metrics page and query "compliance_operator_compliance_scan_error_total". 3.
Actual results:
compliance_operator_compliance_scan_error_total metric with an "error" label containing an error message.
Expected results:
No "error" label.
Additional info:
https://prometheus.io/docs/practices/naming/#labels https://github.com/openshift/compliance-operator/blob/master/doc/usage.md#metrics https://issues.redhat.com/browse/OBSDA-205
- is triggered by
-
OBSDA-205 [FEATURE] allow to configure enforced limits on PrometheusK8sConfig
- Closed
- links to
- mentioned on