Uploaded image for project: 'OpenShift Monitoring'
  1. OpenShift Monitoring
  2. MON-4129

adjust queries and relabel_configs to take into account normalization

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • None
    • False
    • None
    • False
    • NEW
    • NEW
    • MON Sprint 266

      With Prometheus v3, the classic histogram's "le" and summary's "quantile" labels values will be floats.

      All queries (in Alerts, Recording rules, dashboards, or interactive ones) with selectors that assume "le"/"quantile" values to be integers only should be adjusted.
      Same applies to Relabel Configs.

      Queries:

      foo_bucket{le="1"} should be turned into foo_bucket{le=~"1(.0)?"}
      foo_bucket{le=~"1|3"} should be turned into foo_bucket{le=~"1|3(.0)?"}
      

      (same applies to the "quantile" label)

      Relabel configs:

          - action: foo
            regex: foo_bucket;(1|3|5|15.5)
            sourceLabels:
            - __name__
            - le
      
      should be adjusted
      
          - action: foo
            regex: foo_bucket;(1|3|5|15.5)(\.0)?
            sourceLabels:
            - __name__
            - le
      

      (same applies to the "quantile" label)

      Also, from upstream Prometheus:

      Aggregation by the `le` and `quantile` labels for vectors that contain the old and
      new formatting will lead to unexpected results, and range vectors that span the
      transition between the different formatting will contain additional series.
      The most common use case for both is the quantile calculation via
      `histogram_quantile`, e.g.
      `histogram_quantile(0.95, sum by (le) (rate(histogram_bucket[10m])))`.
      The `histogram_quantile` function already tries to mitigate the effects to some
      extent, but there will be inaccuracies, in particular for shorter ranges that
      cover only a few samples.
      

      A warning about this should suffice, as adjusting the queries would be difficult, if not impossible. Additionally, it might complicate things further.

      See attached PRs for examples.

      A downstream check to help surface such misconfigurations was added. An alert will fire for configs that aren't enabled by default and that may need to be adjusted.

      For more details, see https://docs.google.com/document/d/11c0Pr2-Zn3u3cjn4qio8gxFnu9dp0p9bO7gM45YKcNo/edit?tab=t.0#bookmark=id.f5p0o1s8vyjf

              rh-ee-amrini Ayoub Mrini
              rh-ee-amrini Ayoub Mrini
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: