Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-4358

Glitch in rhods_aggregated_availability due to probe_success metrics

XMLWordPrintable

    • RHODS 1.15
    • High

      Description of problem:

      probe_success that is used by the metric rhods_aggregate_availability is failing. Depending on the actual time you execute the query (threshold), the measure is detected or not, giving a glitch, the measure appeared and disappear when refreshing. This problem is affecting previous release of rhods and is important for many measures, including the SLA. This issue is related to RHODS-4229

      Prerequisites (if any, like setup, operators/versions):

      RHODS

      Steps to Reproduce

      1. Go to Observe > Metrics
      2. Write the query (min(min_over_time(probe_success[10s])) by (instance) or label_replace(min(min_over_time(probe_success[10s])), "instance", "combined", "instance", ".*")) or rhods_aggregate_availability or min(up{job="Traefik Proxy Metrics"}) 
      3. Run the query several times until see the glitch

      Actual results:

      The Traefik Proxy measure appears and disappear or don't appear at all when refreshing, the rest of the measures behave as expected.

      Expected results:

      The measure don't disappear and is accurate

      Reproducibility (Always/Intermittent/Only Once):

      Always

      Build Details:

      quay.io/repository/modh/rhods-operator-live-catalog:1.13.0-rhods-4229

      Workaround:

      Additional info:

      This misbehavior of the traefic proxy measure through rhods_aggregate_availability metric probably owe to the probe_success metric min(up{job="Traefik Proxy Metrics"})  Depending on the actual time you execute the query (threshold), the downtime is detected or not, giving the glitch that we saw.

        1. probe2.png
          probe2.png
          33 kB
        2. probe.png
          probe.png
          61 kB
        3. probe2.mov
          6.98 MB
        4. probe.mov
          11.63 MB
        5. dashboard2.webm
          246 kB
        6. jupyterhub.webm
          341 kB
        7. jupyterhub2.webm
          1.97 MB
        8. Screenshot from 2022-07-06 16-57-29.png
          Screenshot from 2022-07-06 16-57-29.png
          97 kB
        9. Screenshot from 2022-07-06 16-58-55.png
          Screenshot from 2022-07-06 16-58-55.png
          74 kB
        10. Screenshot from 2022-07-06 16-36-29.png
          Screenshot from 2022-07-06 16-36-29.png
          110 kB
        11. Screenshot from 2022-07-06 14-52-37.png
          Screenshot from 2022-07-06 14-52-37.png
          110 kB
        12. Screenshot from 2022-07-06 16-55-50.png
          Screenshot from 2022-07-06 16-55-50.png
          106 kB
        13. Screenshot from 2022-07-06 16-58-35.png
          Screenshot from 2022-07-06 16-58-35.png
          100 kB
        14. Screenshot from 2022-07-06 16-35-04.png
          Screenshot from 2022-07-06 16-35-04.png
          105 kB
        15. Screenshot from 2022-07-06 16-58-30.png
          Screenshot from 2022-07-06 16-58-30.png
          227 kB

              rh-ee-atheodor Adriana Theodorakopoulou
              pablo-rhods Pablo Felix (Inactive)
              Pablo Felix Pablo Felix (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: