-
Bug
-
Resolution: Done
-
Undefined
-
None
-
None
-
None
-
None
-
False
-
None
-
False
-
No
-
---
-
---
-
-
-
MK - Sprint 234
For that KafkaBrokerStorageQuotaExceeded critical alert that fired in the last 7 days, here's the PV usage dashboard for the affected instance over the past 7 days: https://grafana.app-sre.devshift.net/d/Q2NXRP8nk/usage-and-limits-2-persistent-volumes?orgId=1&var-cluster_id=7694399d-2170-4139-ae6c-79c637b28669&var-namespace=kafka-caorpfsgebb0ffrjq940&var-pvc=data-0-internal-data-pipeline-kafka-0&var-pvc=data-0-internal-data-pipeline-kafka-1&var-pvc=data-0-internal-data-pipeline-kafka-2&var-pvc=data-internal-data-pipeline-zookeeper-0&var-pvc=data-internal-data-pipeline-zookeeper-1&var-pvc=data-internal-data-pipeline-zookeeper-2
I think this probably aligns with an OSD upgrade, where somehow the metrics were counted twice for a short period of time. As this instance uses more than 50% of the available storage, the doubling brought the "used" space above the limit. There's a corresponding doubling of the "free" space at the same times as the "used" space on each occasion for each PV.
All of the other Kafka clusters on that OSD cluster had similar doubling spikes for their used and free disk space, but they wouldn't cause the alert to fire since they all use less than 0.1% of their available storage.
- is related to
-
MGDSTRM-10902 kube_node_labels_condition prom recording rule failing at runtime
- Closed
- mentioned on