-
Bug
-
Resolution: Done
-
Normal
-
None
-
4.14.z
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
No
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
KubePersistentVolumeFillingUp (= firing when less than 3% of PV size left) alerts are firing for several clusters that have a configured PV size of 100 GB and a max retention of 90 GB. Our CMO config: https://github.com/openshift/managed-cluster-config/blob/master/resources/cluster-monitoring-config/config.yaml#L29 retention: 11d retentionSize: 90GB volumeClaimTemplate: metadata: name: prometheus-data spec: resources: requests: storage: 100Gi We also saw major desyncs in the two PV sizes (screenshot attached), possibly pointing at https://access.redhat.com/solutions/7024829. We are unsure the desync is the cause of it though, as the difference is up to 80 GB.
Version-Release number of selected component (if applicable):
4.14.z
How reproducible:
Repeated pattern of alerts flapping for 4 days then no alerts for 4 days
Steps to Reproduce:
1. 2. 3.
Actual results:
KubePersistentVolumeFillingUp firing
Expected results:
KubePersistentVolumeFillingUp should not fire as PV used data should never be close to the PV limit with the above CMO settings (https://prometheus.io/docs/prometheus/latest/storage/#operational-aspects)
Additional info:
Attaching: - partial must gather (no logs) - container logs for openshift-monitoring namespace - screenshot of PV size difference
- is cloned by
-
OBSDOCS-765 Add clarification about exceeded maximum retention size to docs
-
- Closed
-