Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-49749

Readiness probes must not rely on etcd

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.18.0, 4.19.0
    • kube-apiserver
    • None
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      The readiness probes for the API server (/readyz endpoint) have been modified to exclude etcd checks. This prevents client connections from being closed if etcd is temporarily unavailable. The assumption is that etcd will become ready again before a client connection times out, allowing client connections to persist through brief etcd unavailability and minimizing temporary API server outages.
      Show
      The readiness probes for the API server (/readyz endpoint) have been modified to exclude etcd checks. This prevents client connections from being closed if etcd is temporarily unavailable. The assumption is that etcd will become ready again before a client connection times out, allowing client connections to persist through brief etcd unavailability and minimizing temporary API server outages.
    • Enhancement
    • In Progress

      This is a clone of issue OCPBUGS-48177. The following is the description of the original issue:

      Description of problem:

      Requests allow up to 30s for etcd to respond.  Readiness probes only allow 9s for etcd to respond.  When etcd latency is between 10-30s, standard requests will succeed, but due to the readiness probe configuration we lose every apiserver endpoint at the same time.  This requires correction in the pod definitions and the load balancers.  Making the ongoing readiness check `readyz?exclude=etcd` should correct the issue.

       

      Off the top of my head this will include

      1. kube-apiserver operator
      2. authentication operator
      3. openshift-apiserver operator
      4. MCO apiserver-watch
      5. metal LB
      6. https://github.com/multi-arch/ocp-remote-ci/pull/39
      7. where LBs are defined for aws, azure, and gcp

       

      This is a low cost, low risk, high benefit change.

       

          

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          

      Steps to Reproduce:

          1.
          2.
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:

          

              jchaloup@redhat.com Jan Chaloupka
              openshift-crt-jira-prow OpenShift Prow Bot
              Ke Wang Ke Wang
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: