-
Story
-
Resolution: Done
-
Major
-
None
-
None
-
None
-
3
-
False
-
None
-
False
-
-
In our vertical scaling test, after we delete a machine, we rely on the `status.readyReplicas` field of the ControlPlaneMachineSet (CPMS) to indicate that it has successfully created a new machine that let's us scale up before we scale down.
As we've seen in the past as well, that status field isn't a reliable indicator of the scale up of machines, as status.readyReplicas might stay at 3 as the soon-to-be-removed node that is pending deletion can go Ready=Unknown in runs such as the following:
https://prow.ci.openshift.org/view/gs/test-platform-results/pr-logs/pull/openshift_cluster-etcd-operator/1286/pull-ci-openshift-cluster-etcd-operator-master-e2e-aws-ovn-etcd-scaling/1808186565449486336
Which then ends up the test timing out on waiting for status.readyReplicas=4 while the scale-up and down may already have happened.
In hindsight all we care about is whether the deleted machine's member is replaced by another machine's member and can ignore the flapping of node and machine statuses while we wait for the scale-up then down of members to happen.
So we can relax or replace that check on status.readyReplicas with just looking at the membership change.
PS:
We can also update the outdated Godoc comments for the test to mention that it relies on CPMSO to create a machine for us
https://github.com/openshift/origin/blob/3deedee4ae147a03afdc3d4ba86bc175bc6fc5a8/test/extended/etcd/vertical_scaling.go#L34-L38
- relates to
-
ETCD-329 Update the vertical scaling test to account for CPMSO
- Closed
-
OCPBUGS-36462 control-plane-machine-set goes Available=False with UnavailableReplicas during etcd scale testing
- Closed
- links to