Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-8240

Coordinator sends REBALANCE_START command when there is already a rebalance in progress



    • Bug
    • Resolution: Obsolete
    • Critical
    • None
    • None
    • None
    • None
    • DataGrid Sprint #30


      Normally the REBALANCE_START command should only be sent at the start of a rebalance, and any topology updates sent before all the nodes confirm the rebalance phase should have CH_UPDATE.

      Since the change to 4 phases, this is no longer true: first ClusterCacheStatus.updateTopologyMembers first clears the RebalanceConfirmationCollector, then it broadcasts a CH_UPDATE. Then queueRebalance immediately creates a new RCC and broadcasts a REBALANCE_START, instead of waiting for the current rebalance to finish.

      I propose we remove REBALANCE_START, as it was just a crude version of CacheTopology.Phase. We should also remove the isRebalance parameter from StateConsumerImpl.onTopologyUpdate().

      I'm still not sure if rebalancing the pending CH immediately is ok. On the one hand, I would like the rebalance to finish with updateMembers(union(currentCH, pendingCH)) as the new pending CH, so that segments that were already transferred keep an extra copy. On the other hand, that would only help for segments that have at least on owner in the current CH: if the current CH has 0 owners and updateMembers allocates new ones, those new owners won't request data from the pending CH owners anyway. Fixing that case would require the coordinator to fetch the transfer status from all the nodes before removing a node from the topology. But if the coordinator knew exactly which segments were transferred, it could finish the rebalance immediately and start a new one – so it would be more similar to the current approach.

      Note: the SyncConsistentHashFactory allocation is not 100% stable, even when nodes are not added, so A ∈ owners(segment) in topology ABCD does not guarantee that A ∈ owners(segment) in topology ABC. But it should be good enough to keep A an owner in 90% of the cases.


        Issue Links



              Unassigned Unassigned
              dberinde@redhat.com Dan Berindei (Inactive)
              0 Vote for this issue
              1 Start watching this issue