-
Story
-
Resolution: Done
-
Major
-
None
-
None
-
None
-
5
-
False
-
None
-
False
-
ETCD Sprint 225, ETCD Sprint 226, ETCD Sprint 227, ETCD Sprint 228, ETCD Sprint 229, ETCD Sprint 230, ETCD Sprint 231
The etcd vertical scaling test manually creates a new machine to scale-up and then deletes it to scale-down.
This is effectively horizontally scaling the machines from 3->4->3.
With the ControlPlaneMachineSetOperator (CPMSO) present however, there is a race where once the test scales up to 4, the CPMSO will detect an excess of Ready machines and delete an older one to scale-down back to the desired control-plane size.
https://github.com/openshift/cluster-control-plane-machine-set-operator/blob/925943433d4c224465ae605e8eb6550926dc0dd4/pkg/controllers/controlplanemachineset/updates.go#L320-L345
This fails the vertical scaling test, and in particular blocks the following PR from landing:
https://github.com/openshift/cluster-control-plane-machine-set-operator/pull/98
The vertical scaling test should be updated to not horizontally scale, by serially scaling up then down, and instead initiate scaling by machine deletion so CPMSO can replace that machine with a new one.
The test would also have to detect whether CPMS is not present and in that case fall back to manually creating a new machine to replace the deleted one.
Prior discussion on slack:
https://coreos.slack.com/archives/C027U68LP/p1664320169916089
- blocks
-
ETCD-330 Regression Test: ensure healthy quorum before config update
- To Do
-
ETCD-336 E2E deletion and automatic replacement of an unhealthy member machine in N member cluster
- In Progress
-
OCPBUGS-996 Control Plane Machine Set Operator OnDelete update should cause an error when more than one machine is ready in an index
- Closed
- is related to
-
ETCD-637 Update the vertical scaling test to not rely on CPMS status.readyReplicas
- Closed
-
OCPBUGS-36462 control-plane-machine-set goes Available=False with UnavailableReplicas during etcd scale testing
- Closed
- links to