Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-4797

Alerts "User notebook pvc usage above 90%" and "User notebook pvc usage above 100%" are not fired on failure

    XMLWordPrintable

Details

    • False
    • None
    • False
    • Release Notes
    • Yes
    • No
    • PVC usage limit alerts were not sent when usage exceeded 90% and 100%:: Alerts indicating when a PVC exceeded 90% and 100% of its capacity failed to be triggered and sent.
    • Documented as Resolved Issue
    • No
    • Yes
    • None
    • RHODS 1.15
    • Urgent

    Description

      Description of problem:

      Installing the new version of RHODS we found that some tests are failing because the Alert "User notebook pvc usage above 90%" and "User notebook pvc usage above 100%" are not triggered after performing the tests.

      Taking a look at what is happening, we found that the first part of the expression in those alerts 

      kubelet_volume_stats_used_bytes{prometheus_replica="prometheus-k8s-0",persistentvolumeclaim=~".*jupyterhub-nb-0"}
       / 
      kubelet_volume_stats_capacity_bytes{prometheus_replica="prometheus-k8s-0",persistentvolumeclaim=~"jupyterhub-nb-0"} 

      does not return any value so when we wrote the expression

      kubelet_volume_stats_used_bytes

      we found that the label prometheus_replica="prometheus-k8s-0" does not match with the actual value returned by the expression above:

      label prometheus_replica="prometheus-k8s-1"

       

      We found that issue in the Addon installations of the new RHODS version during an upgrade and during a fresh installation

      Prerequisites (if any, like setup, operators/versions):

       

      Steps to Reproduce

      1. Install RHODS
      2. Execute the test 
        Verify Alert RHODS-PVC-Usage-Above-90 Is Fired When User PVC Is Above 90 Percent

        or manually run in a notebook the script https://github.com/redhat-rhods-qe/ods-ci-notebooks-main/blob/main/notebooks/200__monitor_and_manage/203__alerts/notebook-pvc-usage/fill-notebook-pvc-to-complete-100.ipynb https://github.com/redhat-rhods-qe/ods-ci-notebooks-main/blob/main/notebooks/200__monitor_and_manage/203__alerts/notebook-pvc-usage/fill-notebook-pvc-over-90.ipynb

      3.  Review alert "User notebook pvc usage above 90%" and see that it is not triggered after 2 minutes

      Actual results:

      Alerts are not triggered after a pvc usage above 90%

      Expected results:

      Alerts are triggered

      Reproducibility (Always/Intermittent/Only Once):

      We found this issue in 4 clusters with the addon installed. In a cluster with the catalogsource installation the alert was triggered

      Build Details:

      RHODS v1140-3 in stage

      Workaround:

      Additional info:

      Attachments

        Activity

          People

            vhire Vaishnavi Hire
            pablo-rhods Pablo Felix
            Pablo Felix Pablo Felix
            Votes:
            0 Vote for this issue
            Watchers:
            10 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: