Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-4989

etcd readinessProbe is not reflective of actual readiness

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Undefined Undefined
    • None
    • 4.7.z
    • Etcd
    • None

      This bug is a backport clone of [Bugzilla Bug 1946607](https://bugzilla.redhat.com/show_bug.cgi?id=1946607). The following is the description of the original bug:

      Description of problem: The introduction of SO_REUSEADDR socket options has created a condition where the readiness probe[1] for etcd does not actually properly reflecting the readiness of the operand.

      In the etcd static pod we have code that blocks until a port is released[2]. Before SO_REUSEADDR this blocking allowed the simple TCP probe for readiness to be accurate. But this is no longer the case.

      [1] https://github.com/openshift/cluster-etcd-operator/blob/release-4.8/bindata/etcd/pod.yaml#L165
      [2] https://github.com/openshift/cluster-etcd-operator/blob/release-4.8/bindata/etcd/pod.yaml#L135

      Version-Release number of selected component (if applicable):

      How reproducible: 95%

      Steps to Reproduce:
      1. force new static pod revision `oc patch etcd cluster p='{"spec": {"forceRedeploymentReason": "recovery'"$( date --rfc-3339=ns )"'"}}' --type=merge
      `
      2. watch pods in openshift-etcd namespace. `watch oc get pods -n openshift-etcd`
      3. when a new static pod rolls out there will be a time where the etcd pod containers will show 3/3 ready yet health checks fail.

      Actual results: etcd pod shows ready while quorum-guard fails.

      Expected results: etcd readiness is reflective of operand health.

      Additional info:

            alray@redhat.com Allen Ray
            openshift-crt-jira-prow OpenShift Prow Bot
            ge liu ge liu
            Red Hat Employee
            Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

              Created:
              Updated:
              Resolved: