Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: 4.19, 4.20, 4.21
Component/s: Etcd
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
Important
Regression:
No

Target Backport Versions:
None
Target Version:
None
Release Blocker:
Rejected
Sprint:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

Description of problem:

When running the existing quorum-restore.sh on an etcd member that got removed prior to running the script, the recovery etcd will crash loop with a panic like:

Sep 04 15:06:56 master-0 etcd[14061]: panic: removed all voters
Sep 04 15:06:56 master-0 etcd[14061]: 
Sep 04 15:06:56 master-0 etcd[14061]: goroutine 88 [running]:
Sep 04 15:06:56 master-0 etcd[14061]: go.etcd.io/etcd/raft/v3.(*raft).applyConfChange(0xc0003f3080, {0x0, {0xc001737880, 0x1, 0x1}, {0x0, 0x0, 0x0}})
Sep 04 15:06:56 master-0 etcd[14061]:         go.etcd.io/etcd/raft/v3@v3.5.21/raft.go:1633 +0x1cd
Sep 04 15:06:56 master-0 etcd[14061]: go.etcd.io/etcd/raft/v3.(*node).run(0xc0005b8240)
Sep 04 15:06:56 master-0 etcd[14061]:         go.etcd.io/etcd/raft/v3@v3.5.21/node.go:360 +0xafa
Sep 04 15:06:56 master-0 etcd[14061]: created by go.etcd.io/etcd/raft/v3.RestartNode in goroutine 1
Sep 04 15:06:56 master-0 etcd[14061]:         go.etcd.io/etcd/raft/v3@v3.5.21/node.go:244 +0x239

Version-Release number of selected component (if applicable):

since 4.19, where we introduced quorum-restore.sh

How reproducible:

always

Steps to Reproduce:

    1. create a cluster
    2. remove a member
    3. shutdown all other members
    4. run quorum-restore.sh on the removed member
    5. observe the panic on the coming up container

Actual results:

crashlooping etcd restore container

Expected results:

no panic, just a working etcd

Additional info:

This has been reported upstream before:
https://github.com/etcd-io/etcd/issues/13848

I have a regression test for etcd 3.6 that reproduces this:
https://github.com/tjungblu/etcd/commit/ffd784ae1c862cdd01675a14ff652927776b9ca5

Assignee:: Thomas Jungblut

Reporter:: Thomas Jungblut

Need Info From:: None

Contributors:: None

QA Contact:: Ge Liu

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/09/08 2:39 PM

Updated:: 2025/09/08 5:31 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates