Loading...

XML

Word

Printable

Type: Feature
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- etcd
- stability

Blocked:
False
Ready:
False
Dev Approval:
Not Set
Discussed with Team:
No
Docs Approval:
Not Set
Organization Sponsor:
Service Delivery
PM Approval:
Not Set
QE Approval:
Not Set
Release Note Text:
Undefined
Product Sponsor:
OSD

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Market:

Today cluster-etcd-operator assigns all etcd pods to the endpoints list consumed by apiserver. This in someways is the desired solution as we can allow the client balancer to handle health. The problem with this solution is there are network partition situations where the balancer can get stuck on a specific endpoint.

This is true because the client balancer only checks if the gRPC conn is Ready and while it can be ready that does not preclude it has quorum. So if apiserver somehow could contact the local etcd but it was partitioned from its peers the balancer would still use that endpoint in round robin.

controller: etcd endpoints controller
resource: etcd-endpoints configmap

risks: change in this list today results in a new static pod revision for the apiserver. we need to ensure this does not flap.

alternatives: find a way to ensure the client balancer for etcd can also understand etcd quorum health of the subconn.

Assignee:: Unassigned

Reporter:: Sam Batschelet

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2021/05/14 4:58 PM

Updated:: 2022/06/02 5:38 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

Hide