Uploaded image for project: 'Red Hat 3scale API Management'
  1. Red Hat 3scale API Management
  2. THREESCALE-11692

ThanosRuleHighRuleEvaluationWarnings firing because of counter metric apicast_status

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 2.15.2 GA
    • 3scale Operator, Gateway
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Started
    • Not Started
    • Not Started
    • Not Started
    • Not Started
    • Not Started
    • Important

      Description of problem:

      The Info alert ThanosRuleHighRuleEvaluationWarnings keeps on firing in RHOCP web console.
      
      Thanos-ruler pods streams below warnings indefinitely:
      ===================
      $ oc project openshift-user-workload-monitoring
      $ oc logs -c thanos-ruler thanos-ruler-user-workload-0
      ...
      

      ts=2025-02-12T11:09:17.848275853Z caller=rule.go:944 level=warn component=rules warnings="PromQL info: metric might not be a counter, name does not end in _total/_sum/_count/_bucket: \"apicast_status\"" query="sum(rate(apicast_status{namespace=\"3scale\",status=~\"^4..\"}[1m])) / sum(rate(apicast_status{namespace=\"3scale\"}[1m])) * 100 > 5"

      
      ===================
      
      The metric "apicast_status" is scraped from apicast related components and it is a counter metric as per below documentation:
      [-] https://docs.redhat.com/en/documentation/red_hat_3scale_api_management/2.15/html-single/administering_the_api_gateway/index#prometheus-3scale-metrics 
      
      Prometheus has a certain naming convention of counter metrics. Such metrics are supposed to be end with either of these suffixes _total/ _sum/ _count/ _bucket, which is missing here and it is inducing alert in RHOCP web console. 
      
      

      Version-Release number of selected component (if applicable):

      Tested on RHOCP 4.16.z
      3Scale version 2.15.2

      How reproducible:

      100%

      Steps to Reproduce:

          1. Enable user workload monitoring
          2. Allow creation of required ServiceMonitor and PrometheusRules
          3. Setup 3scale application and create the required workload to scrape "apicast_status" metric.     
          4. Wait for 15 mins and see if alert ThanosRuleHighRuleEvaluationWarnings starts to stream in "Observe > Alerting" menu.
          5. Logs of thanos-ruler pods running in openshift-user-workload-monitoring can be checked to validate the situation as well. 
      
          

      Actual results:

      apicast_status metric is a counter metric and is not following the naming convention as per Prometheus standards which causes ThanosRuleHighRuleEvaluationWarnings to fire in RHOCP.

      Expected results:

      The metric apicast_status should be renamed as per Prometheus nameing standards and with end with  either of these suffixes _total/_sum/_count/_bucket. For eg apicast_status_total or apicast_status_sum or apicast_status_count or apicast_status_bucket

      Additional info:

      Similar issue was reported on github as well which was closed stating the same reason of Prometheus's naming convention.
      [-] https://github.com/canonical/grafana-k8s-operator/issues/316

       

              Unassigned Unassigned
              rhn-support-dgautam Dhruv Gautam
              Votes:
              1 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: