Details
-
Bug
-
Resolution: Duplicate
-
Major
-
None
-
9.0.0.Final
-
None
Description
If a node leaves during a rebalance, the only change is that other nodes will no longer request segments from that node. Otherwise the rebalance proceeds as usual, and at the end a node may delete its copy of a segment even if that leaves less than numOwners copies in the cluster.
When the leaver is the joiner, a 2nd rebalance is very likely to return to the initial CH, but only after transferring some segments twice:
Rebalance starts: current_owners(s) = AB, pending_owners(s) = AC C leaves: current_owners(s) = AB, pending_owners(s) = A Rebalance finishes: current_owners(s) = A, pending_owners(s) = A 2nd rebalance starts: current_owners(s) = A, pending_owners(s) = AB
Even if the leaver is one of the old owners and the 2 segment transfers are necessary, the cluster stays for too long with less than numOwners copies of the segment:
Rebalance starts: current_owners(s) = AB, pending_owners(s) = AC A leaves: current_owners(s) = B, pending_owners(s) = C C transfers segment s from B Rebalance finishes: current_owners(s) = C, pending_owners(s) = C 2nd rebalance starts: current_owners(s) = C, pending_owners(s) = DC
We can fix this by making the pending CH a union of the current and pending CH whenever a node leavers, and only removing extra segment copies after the 2nd rebalance.
Attachments
Issue Links
- duplicates
-
ISPN-4587 Re-add old owners in the pending CH when a node leaves during rebalance
- To Do