Uploaded image for project: 'OpenShift Monitoring'
  1. OpenShift Monitoring
  2. MON-4129

adjust queries and relabel_configs to take into account normalization

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • None
    • False
    • None
    • False
    • NEW
    • NEW
    • MON Sprint 266

      The `PrometheusPossibleNarrowSelectors` alert was added to help identify label selectors misuses after the Prometheus v3 update (More details below)

      Setting Prometheus/Thanos log level to "debug" (see
      https://docs.openshift.com/container-platform/latest/observability/monitoring/configuring-the-monitoring-stack.html#setting-log-levels-for-monitoring-components_configuring-the-monitoring-stack)
      should provide insights into the affected queries and relabeling configs.

      See attached PR for how to fix.
      If assistance is needed, please leave a comment.

      With Prometheus v3, the classic histogram's "le" and summary's "quantile" labels values will be floats.

      All queries (in Alerts, Recording rules, dashboards, or interactive ones) with selectors that assume "le"/"quantile" values to be integers only should be adjusted.
      Same applies to Relabel Configs.

      Queries:

      foo_bucket{le="1"} may need to be turned into foo_bucket{le=~"1(.0)?"}
      foo_bucket{le=~"1|3"} may need to be turned into foo_bucket{le=~"1|3(.0)?"}
      

      (same applies to the "quantile" label)

      Relabel configs:

          - action: foo
            regex: foo_bucket;(1|3|5|15.5)
            sourceLabels:
            - __name__
            - le
      
      may need to be adjusted
      
          - action: foo
            regex: foo_bucket;(1|3|5|15.5)(\.0)?
            sourceLabels:
            - __name__
            - le
      

      (same applies to the "quantile" label)

      Also, from upstream Prometheus:

      Aggregation by the `le` and `quantile` labels for vectors that contain the old and
      new formatting will lead to unexpected results, and range vectors that span the
      transition between the different formatting will contain additional series.
      The most common use case for both is the quantile calculation via
      `histogram_quantile`, e.g.
      `histogram_quantile(0.95, sum by (le) (rate(histogram_bucket[10m])))`.
      The `histogram_quantile` function already tries to mitigate the effects to some
      extent, but there will be inaccuracies, in particular for shorter ranges that
      cover only a few samples.
      

      A warning about this should suffice, as adjusting the queries would be difficult, if not impossible. Additionally, it might complicate things further.

      See attached PRs for examples.

      A downstream check to help surface such misconfigurations was added. An alert will fire for configs that aren't enabled by default and that may need to be adjusted.

      For more details, see https://docs.google.com/document/d/11c0Pr2-Zn3u3cjn4qio8gxFnu9dp0p9bO7gM45YKcNo/edit?tab=t.0#bookmark=id.f5p0o1s8vyjf

              rh-ee-amrini Ayoub Mrini
              rh-ee-amrini Ayoub Mrini
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: