Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Normal
Fix Version/s: 4.20.0
Affects Version/s: 4.16, 4.17, 4.18, 4.19, 4.20, 4.21
Component/s: Etcd
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Low
Regression:
No

Target Backport Versions:

4.17.z, 4.16.z, 4.18.z, 4.19.z
Target Version:

4.20.0
Release Blocker:
None
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
Done
Release Note Type:
Enhancement
Release Note Text:

Hide
With this update, the Cluster etcd Operator introduces alert levels for the `etcdDatabaseQuotaLowSpace` alert, offering administrators timely notifications about low etcd quota usage. This proactive alert system aims to prevent API server instability and allows for effective resource management in managed OpenShift clusters. The alert levels are `info`, `warning`, and `critical`, providing a more granular approach to monitoring etcd quota usage, resulting in dynamic etcd quota management and improved overall cluster performance.

Show
With this update, the Cluster etcd Operator introduces alert levels for the `etcdDatabaseQuotaLowSpace` alert, offering administrators timely notifications about low etcd quota usage. This proactive alert system aims to prevent API server instability and allows for effective resource management in managed OpenShift clusters. The alert levels are `info`, `warning`, and `critical`, providing a more granular approach to monitoring etcd quota usage, resulting in dynamic etcd quota management and improved overall cluster performance.

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

There is a single alert bundled with cluster-etcd-operator called etcdDatabaseQuotaLowSpace that alerts when a cluster is using 95% of it's etcd quota. This alert is often too late, as seen by Managed OpenShift, and doesn't allow administrators enough time to correct issues before the API server is impacfted.

Version-Release number of selected component (if applicable):

How reproducible:

Very

Steps to Reproduce:

    1.Make a Managed Openshift (or OCP cluster) with default control plane size and default 8Gb quota.
    2.Write a loop to create lots of big secrets or configmaps.

Actual results:

The API server is unstable and the only solution is to resize the control plane (or pods backing etcd if in HCP), perform a defrag and try to get back in to delete resources.

Expected results:

Cluster administrators are alerted at info, warning, and then critical levels for etcdDatabaseQuotaLowSpace.

Additional info:

blocks

OCPBUGS-60443 [4.19] Singular etcdDatabaseQuotaLowSpace critical PrometheusRule isn't sufficient

Closed

incorporates

OCPBUGS-60566 Don't manually modify the generated alert manifest file for etcdHighCommitDurations

Closed

is cloned by

OCPBUGS-60443 [4.19] Singular etcdDatabaseQuotaLowSpace critical PrometheusRule isn't sufficient

Closed

links to

openshift/cluster-etcd-operator#1464: OCPBUGS-60237: Vendor latest mixin, including additional and modified alerts for `etcdDatabaseQuotaLowSpace`

openshift/cluster-etcd-operator#1469: Revert "OCPBUGS-60237: Vendor latest mixin, including additional and modified alerts for `etcdDatabaseQuotaLowSpace`"

openshift/cluster-etcd-operator#1471: OCPBUGS-60237: Vendor latest mixin, including additional and modified alerts for etcdDatabaseQuotaLowSpace

(1 links to)

Assignee:: Dean West

Reporter:: Josh Branham

QA Contact:: Sandeep Kundu

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 10 Start watching this issue

Created:: 2025/08/07 3:55 PM

Updated:: 2025/10/21 4:45 AM

Resolved:: 2025/10/21 4:45 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates