Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-4743

Rebalance can hang after the coordinator and another node leave

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XMLWordPrintable

      This caused a failure in ClusterTopologyManagerTest.testAbruptLeaveAfterGetStatus.

      When the coordinator changes, the new coordinator first sends a CacheTopologyControlCommand(type=CH_UPDATE) to reset any ongoing rebalance, then a CacheTopologyControlCommand(type=REBALANCE_START) to start a new rebalance with the remaining members. If another node leaves afterwards, the coordinator sends yet another CacheTopologyControlCommand(type=CH_UPDATE) to remove the leaver from the CHs.

      If one node (in this case the coordinator itself) processes the last CH_UPDATE before the other two commands, it will fail to confirm the rebalance, and the cache will stay in "rebalancing" state until another node joins or leaves.

              dberinde@redhat.com Dan Berindei (Inactive)
              dberinde@redhat.com Dan Berindei (Inactive)
              Archiver:
              rhn-support-adongare Amol Dongare

                Created:
                Updated:
                Resolved:
                Archived: