-
Feature
-
Resolution: Unresolved
-
Major
-
None
-
None
-
BU Product Work
-
False
-
-
False
-
33% To Do, 33% In Progress, 33% Done
-
0
-
Program Call
Feature Overview (aka. Goal Summary)
There are various requests similar to RFE-2799 OCPBU-247 which the underlying cause is that certain level of disrruption is experienced in clusters with high transaction rate towards the API Server.
Goals (aka. expected user outcomes)
After careful discussion and analysis we should try to address these in amore cloud native form while achieving the goal of an experience of hitless defrag of the etcd database.
Requirements (aka. Acceptance Criteria):
The current defrag process is launched when at least a 45% defrag is detected. On clusters with small etcd databases this process can happen within few seconds but in clusters with large etcd databases it can last enough for the users to experiment failures in the transactions.
This last scenario is where customers need an experience that do not affect their main business process.
Questions to Answer (Optional):
- Can we improve the algorithm for triggering the fragmentation to be smarter when it runs? (e.g. consider etcd transaction rate, fragmentation vs database size, "lazy fragmentation" < 2GB and "force fragmentation" > 2G
- Can the operator be designed in such a way that the rolling fragmentation of the etcd members can be done in a way where it does not block transactions during that process?
For example, if the etcd defrag process forces the member to stop committing while it "replay" the copy of the data from memory to disk to create a new defragmented database. Can we improve the design of etcd such that during the defrag process the etcd member can continue operating on an alternate buffer or copy of the db
Out of Scope
High-level list of items that are out of scope. Initial completion during Refinement status.
- incorporates
-
RFE-2799 Customize automatic defrag scheduling
- Rejected
-
RFE-3094 Need set up key rotation outside of working hours for API pods.
- Rejected
-
RFE-3466 Allow flexibility in configuring etcd defragmentation through CR
- Accepted
- is blocked by
-
OCPSTRAT-714 Provide Detailed Administrative Control of all OCP Certs and Keys
- In Progress
-
OCPSTRAT-1111 Selectable etcd database size (TP)
- Closed
- is related to
-
OCPSTRAT-547 Improve Key Handling and Encryption for Kube API Server
- Closed
- relates to
-
OCPSTRAT-555 Improvements to ETCD datastore encryption [from OCPBU-81]
- Closed
- links to