-
Bug
-
Resolution: Cannot Reproduce
-
Undefined
-
None
-
None
-
None
-
Quality / Stability / Reliability
-
1
-
False
-
-
False
-
-
-
Tracing Sprint # 270
The Info alert ThanosRuleHighRuleEvaluationWarnings keeps on firing in RHOCP web console.
Thanos-ruler pods streams below warnings indefinitely:
===================
$ oc project openshift-user-workload-monitoring
$ oc logs -c thanos-ruler thanos-ruler-user-workload-0
oc logs thanos-ruler-user-workload-0 -n openshift-user-workload-monitoring | grep "metric might not be a counter"
ts=2025-02-26T11:59:23.265517526Z caller=rule.go:944 level=warn component=rules warnings="PromQL info: metric might not be a counter, name does not end in _total/_sum/_count/_bucket: \"otelcol_exporter_send_failed_log_records\"" query="increase(otelcol_exporter_send_failed_log_records{namespace=\"xxx-splunkotel-xxxx\"}[5m]) > 0"
Prometheus has a certain naming convention of counter metrics. Such metrics are supposed to be end with either of these suffixes _total/ _sum/ _count/ _bucket, which is missing here and it is inducing alert in RHOCP web console.
This is now enforced from 4.16 +
https://prometheus.io/docs/practices/naming/
Other issues are being raised against different product for this issue - https://issues.redhat.com/browse/THREESCALE-11692