Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-60941

etcdmemberscontroller health check declares all members unhealthy

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • 4.20.0
    • 4.16, 4.17, 4.18, 4.19, 4.20
    • Etcd
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • Done
    • Bug Fix
    • Hide
      Before this update, the timeout on one etcd member caused context deadlines to exceed. As a consequence, all members were declared unhealthy, even though some were reachable. With this release, if one member times out, other members are no longer incorrectly marked as unhealthy.
      Show
      Before this update, the timeout on one etcd member caused context deadlines to exceed. As a consequence, all members were declared unhealthy, even though some were reachable. With this release, if one member times out, other members are no longer incorrectly marked as unhealthy.
    • None
    • None
    • None
    • None

      Description of problem:

      In the face of a timeout reaching one member, the entire context deadlines exceeds and declares all members unhealthy, even though the other members are potentially reachable:
      
      [etcd-operator-5d5946d6c-gjxp9] E0827 10:17:04.841588       1 health.go:115] health check for member (tjungblu15-dq6nb-master-1) failed: err(context deadline exceeded)
      [etcd-operator-5d5946d6c-gjxp9] E0827 10:17:04.842259       1 etcdmemberscontroller.go:81] Unhealthy etcd member found: tjungblu15-dq6nb-master-2, took=29.996749436s, err=health check failed: context deadline exceeded
      [etcd-operator-5d5946d6c-gjxp9] E0827 10:17:04.842496       1 etcdmemberscontroller.go:81] Unhealthy etcd member found: tjungblu15-dq6nb-master-0, took=21.365µs, err=health check failed: context deadline exceeded
      [etcd-operator-5d5946d6c-gjxp9] E0827 10:17:04.842605       1 etcdmemberscontroller.go:81] Unhealthy etcd member found: tjungblu15-dq6nb-master-1, took=33.136µs, err=health check failed: context deadline exceeded
      
          

      Version-Release number of selected component (if applicable):

      any supported version    

      How reproducible:

      always    

      Steps to Reproduce:

          1. make etcd unresponsive (eg. by defrag on a large db size)
          2. wait for the health check on CEO to timeout against that etcd member
          3. observe the operator status to flag all three members as unhealthy
          

      Actual results:

          

      Expected results:

          

      Additional info:

      [etcd-operator-5d5946d6c-gjxp9] E0827 10:17:04.841588       1 health.go:115] health check for member (tjungblu15-dq6nb-master-1) failed: err(context deadline exceeded)
      [etcd-operator-5d5946d6c-gjxp9] E0827 10:17:04.842259       1 etcdmemberscontroller.go:81] Unhealthy etcd member found: tjungblu15-dq6nb-master-2, took=29.996749436s, err=health check failed: context deadline exceeded
      [etcd-operator-5d5946d6c-gjxp9] E0827 10:17:04.842496       1 etcdmemberscontroller.go:81] Unhealthy etcd member found: tjungblu15-dq6nb-master-0, took=21.365µs, err=health check failed: context deadline exceeded
      [etcd-operator-5d5946d6c-gjxp9] E0827 10:17:04.842605       1 etcdmemberscontroller.go:81] Unhealthy etcd member found: tjungblu15-dq6nb-master-1, took=33.136µs, err=health check failed: context deadline exceeded
      
          

              dwest@redhat.com Dean West
              tjungblu@redhat.com Thomas Jungblut
              None
              None
              Sandeep Kundu Sandeep Kundu
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated: