Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-3770

Telemetry metrics error in a cluster without enough resources

XMLWordPrintable

    • IDH Sprint 1.11
    • Medium

      Description of problem:

      Spawning a medium size jupyter container, in a cluster without enough resources, is causing the notebook pod to remain in Pending state for 10 minutes, but the metrics cpu_requests_runtime and running cpu_limits_runtime are counting it as if it was running.

      Prerequisites (if any, like setup, operators/versions):

      rhods version v1.10.0-6

      Steps to Reproduce

      1. install RHODS in a cluster without enough resources and to have a pod in a pending state
      2. Go to openshift-monitoring project
      3. Write the metrics in prometheus cpu_requests_runtime and running cpu_limits_runtime

      Actual results:

      A pod in a pending state is counting into the metric.

      Expected results:

      Pods have to be counted when they are in a running state.

      Reproducibility (Always/Intermittent/Only Once):

      Always

      Build Details:

      Workaround:

      Additional info:

        1. cpu_limits_runtime.png
          cpu_limits_runtime.png
          164 kB
        2. cpu_requests_runtime.png
          cpu_requests_runtime.png
          164 kB
        3. notebook-pods.png
          notebook-pods.png
          54 kB
        4. rhods-1.11-cpu_metrics.png
          rhods-1.11-cpu_metrics.png
          73 kB

              aasthana@redhat.com Anish Asthana
              pablo-rhods Pablo Felix (Inactive)
              Jorge Garcia Oncins Jorge Garcia Oncins
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: