-
Epic
-
Resolution: Done
-
Major
-
None
-
None
-
None
-
Vertical scaling feature updates and testing
-
False
-
None
-
False
-
Not Selected
-
To Do
-
Impediment
-
0% To Do, 0% In Progress, 100% Done
-
-
Since the openshift 4.11 release when vertical scaling primitives were first added to the openshift/cluster-etcd-operator, there have been further updates to provide support for ongoing developments in the ControlPlaneMachineSets (CPMS) and ControlPlaneMachineSetsOperator (CPMSO).
This epic tracks the work required from the etcd side to support any further extensions to the CPMS as it becomes supported on more platforms.
https://github.com/openshift/cluster-control-plane-machine-set-operator/tree/main/docs/user#supported-platforms
E.g the etcd-operator had to update the scale-down constraints to support automatic replacement of unhealthy members via CPMS.
https://issues.redhat.com/browse/ETCD-328
https://github.com/openshift/cluster-etcd-operator/pull/947
Along with supporting any RFEs for CPMS, the etcd-operator has recently added it’s own test suite of periodic jobs to test the vertical scaling workflow which needs to be supported to ensure that the vertical scaling feature is not broken with further updates to the etcd-operator, CPMS changes or other openshift components.
These tests need to be maintained and updated as the CPMS becomes enabled on more platforms (e.g https://github.com/openshift/origin/pull/27497).
The dashboards of these periodic tests also need to be monitored to identify failures that are the result of a change that breaks the vertical scaling workflow.
Failures that result from invariant tests failing (e.g pathological events, or cluster disruption) also need to be triaged to reduce flakes in the etcd-scaling test suite so that the pass rates give a more accurate picture of the health of the vertical scaling feature.