This is a clone of issue OCPBUGS-60443. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-60237. The following is the description of the original issue:
—
Description of problem:
There is a single alert bundled with cluster-etcd-operator called etcdDatabaseQuotaLowSpace that alerts when a cluster is using 95% of it's etcd quota. This alert is often too late, as seen by Managed OpenShift, and doesn't allow administrators enough time to correct issues before the API server is impacfted.
Version-Release number of selected component (if applicable):
How reproducible:
Very
Steps to Reproduce:
1.Make a Managed Openshift (or OCP cluster) with default control plane size and default 8Gb quota. 2.Write a loop to create lots of big secrets or configmaps.
Actual results:
The API server is unstable and the only solution is to resize the control plane (or pods backing etcd if in HCP), perform a defrag and try to get back in to delete resources.
Expected results:
Cluster administrators are alerted at info, warning, and then critical levels for etcdDatabaseQuotaLowSpace.
Additional info:
- blocks
-
OCPBUGS-61505 [4.16] Singular etcdDatabaseQuotaLowSpace critical PrometheusRule isn't sufficient
-
- Closed
-
- clones
-
OCPBUGS-61235 [4.18] Singular etcdDatabaseQuotaLowSpace critical PrometheusRule isn't sufficient
-
- Closed
-
- is blocked by
-
OCPBUGS-61235 [4.18] Singular etcdDatabaseQuotaLowSpace critical PrometheusRule isn't sufficient
-
- Closed
-
- is cloned by
-
OCPBUGS-61505 [4.16] Singular etcdDatabaseQuotaLowSpace critical PrometheusRule isn't sufficient
-
- Closed
-
- links to