Status: Closed (View Workflow)
This causes random failures in ConcurrentOverlappingLeaveTest and ConcurrentNonOverlappingLeaveTest.
1. Starting with a 4-node cluster: [E, F, G, H] (topology 7).
2. F leaves, and E sends a REBALANCE_START command with nodes [E, G, H] (topology 8). Some segments are owned by [H] in the current CH and by [H, G] in the pending CH.
3. E reports that it finished receiving state with a REBAlANCE_CONFIRM command.
4. H leaves, and E sends a CH_UPDATE command with nodes [E, G] (topology 9).
The segments that were owned by [H] in the previous currentCH are assigned to [E, G] in the new currentCH (otherwise they wouldn't have any owners).
5. The StateConsumerImpl on E requests state for the "lost" segments from G.
6. G confirms the end of the rebalance as well, and E sends a CH_UPDATE command to end the rebalance (topology 10).
7. E sends a REBALANCE_START command to assign all segments for [E, G] (topology 11).
8. While the StateConsumerImpl on E is starting the state transfer, it also receives a StateResponseCommand for the lost segments from G.
9. Because the structures keeping track of the received state are not properly initialized, E considers it finished receiving state for topology 11.
10. E receives a StateResponseCommand from G with actual data, but it ignores it because StateConsumerImpl.updatedKeys == null.