Description of problem
Similar to OCPBUGS-20061, but for a different situation:
$ w3m -dump -cols 200 'https://search.dptools.openshift.org/?maxAge=48h&name=pull-ci-openshift-cluster-etcd-operator-master-e2e-aws-ovn-etcd-scaling&type=junit&search=clusteroperator/control-plane-machine-set+should+not+change+condition/Available' | grep 'failures match' | sort pull-ci-openshift-cluster-etcd-operator-master-e2e-aws-ovn-etcd-scaling (all) - 15 runs, 60% failed, 33% of failures match = 20% impact
In that test, since ETCD-329, the test suite deletes a control-plane Machine and waits for the ControlPlaneMachineSet controller to scale in a replacement. But in runs like this, the outgoing Node goes Ready=Unknown for not-yet-diagnosed reasons, and that somehow misses cpmso#294's inertia (maybe the running guard should be dropped?), and the ClusterOperator goes Available=False complaining about Missing 1 available replica(s).
It's not clear from the message which replica it's worried about (that would be helpful information to include in the message), but I suspect it's the Machine/Node that's in the deletion process. But regardless of the message, this does not seem like a situation worth a cluster-admin-midnight-page Available=False alarm.
Version-Release number of selected component
Seen in dev-branch CI. I haven't gone back to check older 4.y.
How reproducible
CI Search shows 20% impact, see my earlier query in this message.
Steps to Reproduce
Run a bunch of pull-ci-openshift-cluster-etcd-operator-master-e2e-aws-ovn-etcd-scaling and check CI Search results.
Actual results
20% impact
Expected results
No hits.
- blocks
-
OCPBUGS-37820 [4.16] control-plane-machine-set goes Available=False with UnavailableReplicas during etcd scale testing
- Closed
- is cloned by
-
OCPBUGS-37820 [4.16] control-plane-machine-set goes Available=False with UnavailableReplicas during etcd scale testing
- Closed
- is related to
-
ETCD-637 Update the vertical scaling test to not rely on CPMS status.readyReplicas
- Closed
- relates to
-
ETCD-329 Update the vertical scaling test to account for CPMSO
- Closed
-
OCPBUGS-20061 control-plane-machine-set goes Available=False with UnavailableReplicas during updates
- Closed
-
OCPBUGS-36301 [4.17] Should run health checks in parallel to avoid spurious Available=False EtcdMembers_NoQuorum claims
- Closed
-
OTA-362 CI: fail update suite if any ClusterOperator go Available=False
- Closed
- links to
-
RHEA-2024:3718 OpenShift Container Platform 4.17.z bug fix update