-
Bug
-
Resolution: Won't Do
-
Undefined
-
None
-
4.7.z
-
None
This bug is a backport clone of [Bugzilla Bug 1946607](https://bugzilla.redhat.com/show_bug.cgi?id=1946607). The following is the description of the original bug:
—
Description of problem: The introduction of SO_REUSEADDR socket options has created a condition where the readiness probe[1] for etcd does not actually properly reflecting the readiness of the operand.
In the etcd static pod we have code that blocks until a port is released[2]. Before SO_REUSEADDR this blocking allowed the simple TCP probe for readiness to be accurate. But this is no longer the case.
[1] https://github.com/openshift/cluster-etcd-operator/blob/release-4.8/bindata/etcd/pod.yaml#L165
[2] https://github.com/openshift/cluster-etcd-operator/blob/release-4.8/bindata/etcd/pod.yaml#L135
Version-Release number of selected component (if applicable):
How reproducible: 95%
Steps to Reproduce:
1. force new static pod revision `oc patch etcd cluster p='{"spec": {"forceRedeploymentReason": "recovery'"$( date --rfc-3339=ns )"'"}}' --type=merge
`
2. watch pods in openshift-etcd namespace. `watch oc get pods -n openshift-etcd`
3. when a new static pod rolls out there will be a time where the etcd pod containers will show 3/3 ready yet health checks fail.
Actual results: etcd pod shows ready while quorum-guard fails.
Expected results: etcd readiness is reflective of operand health.
Additional info: