Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-1390

Prometheus not working. Pod status is CrashLoopBackOff

XMLWordPrintable

    • False
    • False
    • No
    • No
    • Undefined
    • Yes
    • Yes
    • None

      Description of problem:

      We have installed RHODS 1.0.15 in two PSI clusters (mod-qe-1 and mod-qe4) using our script.

      After the installation, prometheus is not available and the status of the pod is CrashLoopBackOff  (see attached images)

      This bug is a test blocker for us, as we need to test prometheus metrics

      We believe this is a bug in rhods 1.0.15, but it could be a bug in our installer script. Could you verify if you have the same behavior in your clusters?

       

       

      Prerequisites (if any, like setup, operators/versions):

      Steps to Reproduce

      1. Install RHODS 1.0.15 in mod-qe-4 running the rhods-smoke pipeline following the instructions in ODS Smoke Suite for Checking Build Readiness
      2. Once the installation is finished, go to https://prometheus-redhat-ods-monitoring.apps.modh-qe-4.dev.datahub.redhat.com/ and verify that "Application is not available"
      3. Ask QE team for kubeadmin credentials for mod-qe-4
      4. Go to Workloads > Pods and select project redjat-osd-monitoring
      5. Verify that pod prometheus-xxxxx has status CrashLoopBackOff

      Actual results:

      Expected results:

      prometheus application should be available

       

      Reproducibility (Always/Intermittent/Only Once):

      It happened at least in 2 PSI clusters (mod-qe-1 and mod-qe-4) and I believe it also happened in a OpenShiftDedicated cluster we had last week

      Build Details:

      Additional info:

        1. PersistentVolumeClaims.png
          50 kB
          Jorge Garcia Oncins
        2. PersistentVolumes.png
          77 kB
          Jorge Garcia Oncins
        3. prometheus-pod-CrashLoppBackOff.png
          109 kB
          Jorge Garcia Oncins
        4. prometheus-pod-CrashLoppBackOff-events1.png
          182 kB
          Jorge Garcia Oncins
        5. prometheus-pod-CrashLoppBackOff-events2.png
          193 kB
          Jorge Garcia Oncins
        6. prometheus-pod-CrashLoppBackOff-events3.png
          178 kB
          Jorge Garcia Oncins
        7. prometheus-pod-CrashLoppBackOff-log.png
          129 kB
          Jorge Garcia Oncins

              mshah10 Maulik Shah (Inactive)
              rhn-support-jgarciao Jorge Garcia Oncins
              Jorge Garcia Oncins Jorge Garcia Oncins
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: