-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
odf-4.16
-
None
Description of problem - Provide a detailed description of the issue encountered, including logs/command-output snippets and screenshots if the issue is observed in the UI:
The Info alert ThanosRuleHighRuleEvaluationWarnings keeps on firing in RHOCP web console.
Thanos-ruler pods streams below warnings indefinitely:
===================
$ oc project openshift-user-workload-monitoring
$ oc logs -c thanos-ruler thanos-ruler-user-workload-0
2025-08-06T12:54:32.982497925+09:00 ts=2025-08-06T03:54:32.982471601Z caller=rule.go:944 level=warn component=rules warnings="PromQL info: metric might not be a counter, name does not end in _total/_sum/_count/_bucket: \"NooBaa_providers_bandwidth_read_size\", PromQL info: metric might not be a counter, name does not end in _total/_sum/_count/_bucket: \"NooBaa_providers_bandwidth_write_size\"" query="sum by (namespace, managedBy, job, service) (rate(NooBaa_providers_bandwidth_read_size
{namespace=\"openshift-storage\"}[5m]) + rate(NooBaa_providers_bandwidth_write_size{namespace="openshift-storage"}[5m]))"
Prometheus has a certain naming convention of counter metrics. Such metrics are supposed to be end with either of these suffixes _total/ _sum/ _count/ _bucket, which is missing here and it is inducing alert in RHOCP web console.
Other metrics from noobaa producing an errror as well.
$ omc logs -n openshift-user-workload-monitoring thanos-ruler-user-workload-0 thanos-ruler | grep -i noobaa | sed -e 's;^.*_bucket:
;;' | sort | uniq
"NooBaa_providers_bandwidth_read_size\"" query="sum by (namespace, managedBy, job, service) (rate(NooBaa_providers_bandwidth_read_size
[5m]))"
"NooBaa_providers_bandwidth_write_size\"" query="sum by (namespace, managedBy, job, service) (rate(NooBaa_providers_bandwidth_read_size
[5m]))"
"NooBaa_providers_ops_read_num\"" query="sum by (namespace, managedBy, job, service) (rate(NooBaa_providers_ops_read_num
[5m]))"
"NooBaa_providers_ops_write_num\"" query="sum by (namespace, managedBy, job, service) (rate(NooBaa_providers_ops_read_num
[5m]))"
The OCP platform infrastructure and deployment type (AWS, Bare Metal, VMware, etc. Please clarify if it is platform agnostic deployment), (IPI/UPI):
VMware
The ODF deployment type (Internal, External, Internal-Attached (LSO), Multicluster, DR, Provider, etc):
Internal, thin-csi
The version of all relevant components (OCP, ODF, RHCS, ACM whichever is applicable):
OCP: 4.16.40
ODF: 4.16.9
Does this issue impact your ability to continue to work with the product?
No
Is there any workaround available to the best of your knowledge?
No
Can this issue be reproduced? If so, please provide the hit rate
Yes, always
Expected results:
No alert should be fired.
Additional info:
Similar issues are reported in Jira for other product.
- 3scale: https://issues.redhat.com/browse/THREESCALE-11692
- OpenTelemetry: https://issues.redhat.com/browse/TRACING-5200