-
Feature
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
Proactive Architecture
-
False
-
-
False
-
OCPSTRAT-892Optimize & Enhance Etcd's Functionality, Performance, and Manageability w/ Hosted Control Planes
-
100% To Do, 0% In Progress, 0% Done
-
0
-
Backlog Refinement
User Story:
As a (user persona), I want to be able to:
- As a customer, I don't want my cluster to fall over because etcd ran out of disk space.
- As a SRE, I don't want to have to rebuild broken etcd databases.
- As a manager writing checks for management cluster infra and SREs, I want to spend more on infra so I can save on SRE time.
so that I can achieve
- Healthy HyperShift growth.
Acceptance Criteria:
Description of criteria:
- Pegging an etcd database and then running a defrag should not exhaust available disk space in etcd's PersistentVolumes.
(optional) Out of Scope:
Figuring out if gb3 PersistentVolumes are resizeable.
Engineering Details:
HOSTEDCP-947took a run at this, but an 8 GiB etcd bbolt file only just fits into an 8 GiB PersistentVolume, and that leaves no room for WAL files and the database-copy that defrag generates, and whatever else needs to go into /var/lib. As long as this field is immutable, it's probably worth aiming very high (50 GiB? Probably get at least a sanity check from the etcd folks), because storage is cheap, and rebuilding etcd databases is expensive.
This does not require a design proposal.
This does not require a feature gate.
- is caused by
-
HOSTEDCP-947 Hosted etcd running out of space on PVC after scale test
- Closed
- is related to
-
OCPBUGS-19665 KubeAPI is unable to automatically compact/defrag the etcd DB if it goes over the 8GB size limit
- Closed