Loading...

XML

Word

Printable

Type: Story
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Labels:
- pmr-ai

Work Type:
Improvement
Blocked:
False
Blocked Reason:
None
Ready:
False

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

etcd#13294 points out that time-to-live clocks currently reset on leader elections (I think?). That can lead to difficulties during Event-spew incidents, with situations like:

1. Some initial trigger destabilizes etcd, and causes a leader election.
2. Events continue to flow in, because often etcd instability is correlated with a bunch of cluster components complaining about surprising things.
3. Reset TTL means etcd is no longer reaping expired Events.
4. Two hours in, we're up to ~1.6 times the events we should have, with a two our chunk of old events that should have been expired, but which the TTL reset leaves unreaped.
5. Large numbers of events lead to high resource consumption in clients that LIST events, or which are populating Event informers, etc.
6. High resource consumption destabilizes etcd, and we have a new leader election.
7. Return to step 1, but now with more old, unreaped Events in the backlog.

and so forth, until the old, unreaped Events are a huge drag on the system and it melts down.

There are a number of places where we could defend against this Event-spew scenario, but one on the etcd side would be setting --experimental-enable-lease-checkpoint near where we currently set --experimental-initial-corrupt-check (here and similar). This ticket is about figuring out whether we want to do that.

- - Sort By Name
  - Sort By Date
  - Ascending
  - Descending
  - Thumbnails
  - List
  - Download All

image (2).png
83 kB
2022/08/09 6:18 PM

relates to

API-1456 RFE: Event-spew hardening

Closed

Assignee:: Unassigned

Reporter:: W. Trevor King

Votes:: 1 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2022/08/09 6:00 PM

Updated:: 2025/01/06 6:23 AM

Target start:: 2022/08/09

Target end:: 2023/02/09

Details

Description

Attachments

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates