Uploaded image for project: 'Distributed Tracing'
  1. Distributed Tracing
  2. TRACING-5200

ThanosRuleHighRuleEvaluationWarnings firing because of `otelcol_exporter_send_failed_log_records`

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Cannot Reproduce
    • Icon: Undefined Undefined
    • None
    • None
    • OpenTelemetry
    • None
    • Quality / Stability / Reliability
    • 1
    • False
    • Hide

      None

      Show
      None
    • False
    • Tracing Sprint # 270

      The Info alert ThanosRuleHighRuleEvaluationWarnings keeps on firing in RHOCP web console.

      Thanos-ruler pods streams below warnings indefinitely:
      ===================
      $ oc project openshift-user-workload-monitoring
      $ oc logs -c thanos-ruler thanos-ruler-user-workload-0
       

      oc logs thanos-ruler-user-workload-0 -n openshift-user-workload-monitoring | grep "metric might not be a counter"
      ts=2025-02-26T11:59:23.265517526Z caller=rule.go:944 level=warn component=rules warnings="PromQL info: metric might not be a counter, name does not end in _total/_sum/_count/_bucket: \"otelcol_exporter_send_failed_log_records\"" query="increase(otelcol_exporter_send_failed_log_records{namespace=\"xxx-splunkotel-xxxx\"}[5m]) > 0"

      Prometheus has a certain naming convention of counter metrics. Such metrics are supposed to be end with either of these suffixes _total/ _sum/ _count/ _bucket, which is missing here and it is inducing alert in RHOCP web console.

      This is now enforced from 4.16 +

      https://prometheus.io/docs/practices/naming/ 

      Other issues are being raised against different product for this issue - https://issues.redhat.com/browse/THREESCALE-11692 

              ploffay@redhat.com Pavol Loffay
              rhn-support-nigsmith Nigel Smith
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: