-
Bug
-
Resolution: Done
-
Major
-
None
-
False
-
None
-
False
-
Release Notes
-
Yes
-
-
-
-
-
-
-
No
-
PVC usage limit alerts were not sent when usage exceeded 90% and 100%:: Alerts indicating when a PVC exceeded 90% and 100% of its capacity failed to be triggered and sent.
-
Documented as Resolved Issue
-
No
-
Yes
-
None
-
RHODS 1.15
-
Critical
Description of problem:
Installing the new version of RHODS we found that some tests are failing because the Alert "User notebook pvc usage above 90%" and "User notebook pvc usage above 100%" are not triggered after performing the tests.
Taking a look at what is happening, we found that the first part of the expression in those alerts
kubelet_volume_stats_used_bytes{prometheus_replica="prometheus-k8s-0",persistentvolumeclaim=~".*jupyterhub-nb-0"} / kubelet_volume_stats_capacity_bytes{prometheus_replica="prometheus-k8s-0",persistentvolumeclaim=~"jupyterhub-nb-0"}
does not return any value so when we wrote the expression
kubelet_volume_stats_used_bytes
we found that the label prometheus_replica="prometheus-k8s-0" does not match with the actual value returned by the expression above:
label prometheus_replica="prometheus-k8s-1"
We found that issue in the Addon installations of the new RHODS version during an upgrade and during a fresh installation
Prerequisites (if any, like setup, operators/versions):
Steps to Reproduce
- Install RHODS
- Execute the test
Verify Alert RHODS-PVC-Usage-Above-90 Is Fired When User PVC Is Above 90 Percent
or manually run in a notebook the script https://github.com/redhat-rhods-qe/ods-ci-notebooks-main/blob/main/notebooks/200__monitor_and_manage/203__alerts/notebook-pvc-usage/fill-notebook-pvc-to-complete-100.ipynb https://github.com/redhat-rhods-qe/ods-ci-notebooks-main/blob/main/notebooks/200__monitor_and_manage/203__alerts/notebook-pvc-usage/fill-notebook-pvc-over-90.ipynb
- Review alert "User notebook pvc usage above 90%" and see that it is not triggered after 2 minutes
Actual results:
Alerts are not triggered after a pvc usage above 90%
Expected results:
Alerts are triggered
Reproducibility (Always/Intermittent/Only Once):
We found this issue in 4 clusters with the addon installed. In a cluster with the catalogsource installation the alert was triggered
Build Details:
RHODS v1140-3 in stage
Workaround:
Additional info:
- links to
- mentioned on