Description of problem:
In the face of a timeout reaching one member, the entire context deadlines exceeds and declares all members unhealthy, even though the other members are potentially reachable: [etcd-operator-5d5946d6c-gjxp9] E0827 10:17:04.841588 1 health.go:115] health check for member (tjungblu15-dq6nb-master-1) failed: err(context deadline exceeded) [etcd-operator-5d5946d6c-gjxp9] E0827 10:17:04.842259 1 etcdmemberscontroller.go:81] Unhealthy etcd member found: tjungblu15-dq6nb-master-2, took=29.996749436s, err=health check failed: context deadline exceeded [etcd-operator-5d5946d6c-gjxp9] E0827 10:17:04.842496 1 etcdmemberscontroller.go:81] Unhealthy etcd member found: tjungblu15-dq6nb-master-0, took=21.365µs, err=health check failed: context deadline exceeded [etcd-operator-5d5946d6c-gjxp9] E0827 10:17:04.842605 1 etcdmemberscontroller.go:81] Unhealthy etcd member found: tjungblu15-dq6nb-master-1, took=33.136µs, err=health check failed: context deadline exceeded
Version-Release number of selected component (if applicable):
any supported version
How reproducible:
always
Steps to Reproduce:
1. make etcd unresponsive (eg. by defrag on a large db size) 2. wait for the health check on CEO to timeout against that etcd member 3. observe the operator status to flag all three members as unhealthy
Actual results:
Expected results:
Additional info:
[etcd-operator-5d5946d6c-gjxp9] E0827 10:17:04.841588 1 health.go:115] health check for member (tjungblu15-dq6nb-master-1) failed: err(context deadline exceeded) [etcd-operator-5d5946d6c-gjxp9] E0827 10:17:04.842259 1 etcdmemberscontroller.go:81] Unhealthy etcd member found: tjungblu15-dq6nb-master-2, took=29.996749436s, err=health check failed: context deadline exceeded [etcd-operator-5d5946d6c-gjxp9] E0827 10:17:04.842496 1 etcdmemberscontroller.go:81] Unhealthy etcd member found: tjungblu15-dq6nb-master-0, took=21.365µs, err=health check failed: context deadline exceeded [etcd-operator-5d5946d6c-gjxp9] E0827 10:17:04.842605 1 etcdmemberscontroller.go:81] Unhealthy etcd member found: tjungblu15-dq6nb-master-1, took=33.136µs, err=health check failed: context deadline exceeded
- blocks
-
ETCD-638 [GA] Selectable etcd database size
-
- New
-
-
OCPBUGS-61019 [4.19] etcdmemberscontroller health check declares all members unhealthy
-
- Verified
-
- is cloned by
-
OCPBUGS-61019 [4.19] etcdmemberscontroller health check declares all members unhealthy
-
- Verified
-
- links to