-
Bug
-
Resolution: Done
-
Critical
-
7.0.0.Beta2
This caused a failure in ClusterTopologyManagerTest.testAbruptLeaveAfterGetStatus.
When the coordinator changes, the new coordinator first sends a CacheTopologyControlCommand(type=CH_UPDATE) to reset any ongoing rebalance, then a CacheTopologyControlCommand(type=REBALANCE_START) to start a new rebalance with the remaining members. If another node leaves afterwards, the coordinator sends yet another CacheTopologyControlCommand(type=CH_UPDATE) to remove the leaver from the CHs.
If one node (in this case the coordinator itself) processes the last CH_UPDATE before the other two commands, it will fail to confirm the rebalance, and the cache will stay in "rebalancing" state until another node joins or leaves.