XML

Word

Printable

Type: Epic
Resolution: Obsolete
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Labels:
None

Epic Name:
Stability - Minimize disruption from etcd leader elections
Epic Status:
To Do
Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:
None
Ready:
False
Size:
None

Target Version:
None
Release Blocker:
None

Epic Goal

Make leader election impacts less disruptive to clusters

Why is this important?

Leader elections cause disruption as there is technically no leader and we cannot serve linearized transactions.

Scenarios

Acceptance Criteria

CI - MUST be running successfully with tests automated
Release Technical Enablement - Provide necessary release enablement details and documents.
...

Dependencies (internal and external)

Previous Work (Optional):

https://bugzilla.redhat.com/show_bug.cgi?id=1870274

Open questions:

1.) Does the leader election itself result in additional I/O latencies? Do we replay wal file for example

2.) Since we just told all clients to retry does this retry all disrupt etcds ability to respond timely?

3.) Do all clients actually fail then retry? If the client chooses to just timeout it should be explored why and if it can be changed.

4.) how much time does a leader election take before we can serve requests on average.

Done Checklist

CI - CI is running, tests are automated and merged.
Release Enablement <link to Feature Enablement Presentation>
DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
DEV - Downstream build attached to advisory: <link to errata>
QE - Test plans in Polarion: <link or reference to Polarion>
QE - Automated tests merged: <link or reference to automated tests>
DOC - Downstream documentation merged: <link to meaningful PR>

is caused by

OCPPLAN-7577 minimize disruption from etcd leader elections

Assignee:: Unassigned

Reporter:: Wallace Lewis

Need Info From:: None

Contributors:: None

QA Contact:: None

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2021/05/20 3:03 PM

Updated:: 2025/08/16 5:11 AM

Resolved:: 2024/05/13 8:05 PM

Details

Description

Epic Goal

Why is this important?

Scenarios

Acceptance Criteria

Dependencies (internal and external)

Previous Work (Optional):

Open questions:

1.) Does the leader election itself result in additional I/O latencies? Do we replay wal file for example

2.) Since we just told all clients to retry does this retry all disrupt etcds ability to respond timely?

3.) Do all clients actually fail then retry? If the client chooses to just timeout it should be explored why and if it can be changed.

4.) how much time does a leader election take before we can serve requests on average.

Done Checklist

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates