-
Epic
-
Resolution: Done
-
Major
-
None
-
None
-
Etcd Tuning Parameters
-
False
-
None
-
False
-
Not Selected
-
To Do
-
0% To Do, 0% In Progress, 100% Done
Epic Goal*
Provide a way to tune the etcd latency parameters ETCD_HEARTBEAT_INTERVAL and ETCD_ELECTION_TIMEOUT.
Why is this important? (mandatory)
OCP4 does not have a way to tune the etc parameters like timeout, heartbeat intervals, etc. Adjusting these parameters indiscriminately may compromise the stability of the control plane. In scenarios where disk IOPS are not ideal (e.g. disk degradation, storage providers in Cloud environments) this parameters could be adjusted to improve stability of the control plane while raising the corresponding warning notifications.
In the past:
- There has been workarounds required as "one off" for Cloud providers (https://github.com/openshift/machine-config-operator/pull/1507) (https://github.com/openshift/cluster-etcd-operator/pull/218) to tune these parameters.
- There has been requests from community for tuning these:
(https://github.com/openshift/cluster-etcd-operator/pull/515) (https://github.com/openshift/cluster-etcd-operator/issues/499)
The current default values on a 4.10 deployment
```
name: ETCD_ELECTION_TIMEOUT
value: "1000"
name: ETCD_ENABLE_PPROF
value: "true"
name: ETCD_EXPERIMENTAL_MAX_LEARNERS
value: "3"
name: ETCD_EXPERIMENTAL_WARNING_APPLY_DURATION
value: 200ms
name: ETCD_EXPERIMENTAL_WATCH_PROGRESS_NOTIFY_INTERVAL
value: 5s
name: ETCD_HEARTBEAT_INTERVAL
value: "100"
```
and these are modified for exceptions of specific cloud providers (https://github.com/openshift/cluster-etcd-operator/blob/master/pkg/etcdenvvar/etcd_env.go#L232-L254).
The guidance for latency among control plane nodes do not translate well to on-premise live scenarios https://access.redhat.com/articles/3220991
Scenarios (mandatory)
Defining etcd-operator API to provide the cluster-admin the ability to set `ETCD_ELECTION_TIMEOUT` and `ETCD_HEARTBEAT_INTERVAL` within certain range.
Dependencies (internal and external) (mandatory)
No external teams
Contributing Teams(and contacts) (mandatory)
Our expectation is that teams would modify the list below to fit the epic. Some epics may not need all the default groups but what is included here should accurately reflect who will be involved in delivering the epic.
- Development - etcd team
- Documentation -
- QE -
- PX -
- Others -
Acceptance Criteria (optional)
Provide some (testable) examples of how we will know if we have achieved the epic goal.
Drawbacks or Risk (optional)
Reasons we should consider NOT doing this such as: limited audience for the feature, feature will be superseded by other work that is planned, resulting feature will introduce substantial administrative complexity or user confusion, etc.
Done - Checklist (mandatory)
The following points apply to all epics and are what the OpenShift team believes are the minimum set of criteria that epics should meet for us to consider them potentially shippable. We request that epic owners modify this list to reflect the work to be completed in order to produce something that is potentially shippable.
- CI Testing - Basic e2e automationTests are merged and completing successfully
- Documentation - Content development is complete.
- QE - Test scenarios are written and executed successfully.
- Technical Enablement - Slides are complete (if requested by PLM)
- Engineering Stories Merged
- All associated work items with the Epic are closed
- Epic status should be “Release Pending”
- is depended on by
-
OCPSTRAT-342 [etcd-operator] etcd timers selectable profiles (TechPreview)
- Closed