Description of problem:
During rolling cluster upgrades from etcd 3.5.19 to 3.6.x, a membership data propagation issue occurs where wrong membership data may be propagated to new members during snapshot transfers. This results in cluster upgrade failures with the error etcdserver: too many learner members in cluster.
Version-Release number of selected component (if applicable):
4.19-4.21
How reproducible:
Consistently reproducible during multi-stage cluster upgrades involving etcd <=3.5.19 and subsequent upgrades to 3.6.x when adding learner members.
Steps to Reproduce:
1. Upgrade cluster from a version using etcd 3.5.19 to etcd 3.5.21 2. Attempt subsequent upgrade to etcd 3.6.4 (or other 3.6.x version) 3. Add new learner members during the upgrade process 4. Observe membership data propagation during snapshot transfer
Actual results:
Wrong membership data is propagated to new members during snapshot transfers, causing the error etcdserver: too many learner members in cluster and blocking the cluster upgrade from completing successfully.
Expected results:
Membership data should be correctly propagated to new members during snapshot transfers, allowing the cluster upgrade to complete successfully without membership-related errors.
Additional info:
Upstream issue: https://github.com/etcd-io/etcd/issues/20793
- is depended on by
-
OCPBUGS-63474 Rebase etcd to 3.5.24
-
- ASSIGNED
-
- links to