-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
4.16, 4.17, 4.18, 4.19, 4.20
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
In Progress
-
Bug Fix
-
An etcd member that times out to respond after 30s would be declared unhealthy by the cluster-etcd-operator. Before this fix, all other healthy etcd members would also be defined unhealthy due to a shared timeout context.
-
None
-
None
-
None
-
None
This is a clone of issue OCPBUGS-61019. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-60941. The following is the description of the original issue:
—
Description of problem:
In the face of a timeout reaching one member, the entire context deadlines exceeds and declares all members unhealthy, even though the other members are potentially reachable: [etcd-operator-5d5946d6c-gjxp9] E0827 10:17:04.841588 1 health.go:115] health check for member (tjungblu15-dq6nb-master-1) failed: err(context deadline exceeded) [etcd-operator-5d5946d6c-gjxp9] E0827 10:17:04.842259 1 etcdmemberscontroller.go:81] Unhealthy etcd member found: tjungblu15-dq6nb-master-2, took=29.996749436s, err=health check failed: context deadline exceeded [etcd-operator-5d5946d6c-gjxp9] E0827 10:17:04.842496 1 etcdmemberscontroller.go:81] Unhealthy etcd member found: tjungblu15-dq6nb-master-0, took=21.365µs, err=health check failed: context deadline exceeded [etcd-operator-5d5946d6c-gjxp9] E0827 10:17:04.842605 1 etcdmemberscontroller.go:81] Unhealthy etcd member found: tjungblu15-dq6nb-master-1, took=33.136µs, err=health check failed: context deadline exceeded
Version-Release number of selected component (if applicable):
any supported version
How reproducible:
always
Steps to Reproduce:
1. make etcd unresponsive (eg. by defrag on a large db size) 2. wait for the health check on CEO to timeout against that etcd member 3. observe the operator status to flag all three members as unhealthy
Actual results:
Expected results:
Additional info:
[etcd-operator-5d5946d6c-gjxp9] E0827 10:17:04.841588 1 health.go:115] health check for member (tjungblu15-dq6nb-master-1) failed: err(context deadline exceeded) [etcd-operator-5d5946d6c-gjxp9] E0827 10:17:04.842259 1 etcdmemberscontroller.go:81] Unhealthy etcd member found: tjungblu15-dq6nb-master-2, took=29.996749436s, err=health check failed: context deadline exceeded [etcd-operator-5d5946d6c-gjxp9] E0827 10:17:04.842496 1 etcdmemberscontroller.go:81] Unhealthy etcd member found: tjungblu15-dq6nb-master-0, took=21.365µs, err=health check failed: context deadline exceeded [etcd-operator-5d5946d6c-gjxp9] E0827 10:17:04.842605 1 etcdmemberscontroller.go:81] Unhealthy etcd member found: tjungblu15-dq6nb-master-1, took=33.136µs, err=health check failed: context deadline exceeded
- clones
-
OCPBUGS-61019 [4.19] etcdmemberscontroller health check declares all members unhealthy
-
- Closed
-
- is blocked by
-
OCPBUGS-61019 [4.19] etcdmemberscontroller health check declares all members unhealthy
-
- Closed
-
- links to