Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-4797

Alerts "User notebook pvc usage above 90%" and "User notebook pvc usage above 100%" are not fired on failure

XMLWordPrintable

    • False
    • None
    • False
    • Release Notes
    • Yes
    • No
    • PVC usage limit alerts were not sent when usage exceeded 90% and 100%:: Alerts indicating when a PVC exceeded 90% and 100% of its capacity failed to be triggered and sent.
    • Documented as Resolved Issue
    • No
    • Yes
    • None
    • RHODS 1.15
    • Urgent

      Description of problem:

      Installing the new version of RHODS we found that some tests are failing because the Alert "User notebook pvc usage above 90%" and "User notebook pvc usage above 100%" are not triggered after performing the tests.

      Taking a look at what is happening, we found that the first part of the expression in those alerts 

      kubelet_volume_stats_used_bytes{prometheus_replica="prometheus-k8s-0",persistentvolumeclaim=~".*jupyterhub-nb-0"}
       / 
      kubelet_volume_stats_capacity_bytes{prometheus_replica="prometheus-k8s-0",persistentvolumeclaim=~"jupyterhub-nb-0"} 

      does not return any value so when we wrote the expression

      kubelet_volume_stats_used_bytes

      we found that the label prometheus_replica="prometheus-k8s-0" does not match with the actual value returned by the expression above:

      label prometheus_replica="prometheus-k8s-1"

       

      We found that issue in the Addon installations of the new RHODS version during an upgrade and during a fresh installation

      Prerequisites (if any, like setup, operators/versions):

       

      Steps to Reproduce

      1. Install RHODS
      2. Execute the test 
        Verify Alert RHODS-PVC-Usage-Above-90 Is Fired When User PVC Is Above 90 Percent

        or manually run in a notebook the script https://github.com/redhat-rhods-qe/ods-ci-notebooks-main/blob/main/notebooks/200__monitor_and_manage/203__alerts/notebook-pvc-usage/fill-notebook-pvc-to-complete-100.ipynb https://github.com/redhat-rhods-qe/ods-ci-notebooks-main/blob/main/notebooks/200__monitor_and_manage/203__alerts/notebook-pvc-usage/fill-notebook-pvc-over-90.ipynb

      3.  Review alert "User notebook pvc usage above 90%" and see that it is not triggered after 2 minutes

      Actual results:

      Alerts are not triggered after a pvc usage above 90%

      Expected results:

      Alerts are triggered

      Reproducibility (Always/Intermittent/Only Once):

      We found this issue in 4 clusters with the addon installed. In a cluster with the catalogsource installation the alert was triggered

      Build Details:

      RHODS v1140-3 in stage

      Workaround:

      Additional info:

        1. image-2022-08-03-07-45-00-826.png
          160 kB
          Vaishnavi Hire
        2. Screen Shot 2022-08-03 at 7.44.01 AM.png
          282 kB
          Vaishnavi Hire
        3. Screenshot from 2022-08-01 16-50-51.png
          220 kB
          Pablo Felix
        4. Screenshot from 2022-08-02 12-54-19.png
          72 kB
          Pablo Felix
        5. Screenshot from 2022-08-02 12-56-31.png
          26 kB
          Pablo Felix
        6. Screenshot from 2022-08-02 13-12-31.png
          130 kB
          Pablo Felix
        7. Screenshot from 2022-08-04 10-12-42.png
          100 kB
          Chris Chase
        8. Screenshot from 2022-08-04 17-41-08.png
          86 kB
          Pablo Felix

            vhire Vaishnavi Hire
            pablo-rhods Pablo Felix
            Pablo Felix Pablo Felix
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

              Created:
              Updated:
              Resolved: