Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-2825

ClusterTopologyManagerImpl should not hold a lock while invoking an RPC

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Critical Critical
    • 5.3.0.Final
    • 5.2.1.Final
    • State Transfer

      On the coordinator, ClusterTopologyManagerImpl holds a lock on a cache's ClusterCacheStatus while it is invoking a synchronous REBALANCE_START or CH_UPDATE command. This helps ensure the ordering of the commands is the same on all the members.

      However, this has some downsides. On a joining node, it takes quite some time before replying to the coordinator (as it needs to request transactions from the other nodes). The nodes that don't need to request any data will send a REBALANCE_CONFIRM command to the coordinator right away, but that command will block on the ClusterCacheStatus lock. If the number of OOB threads is limited, this can even lead to a deadlock.

      Now that CH_UPDATE commands also increment the topology id, we don't really need to enforce the same ordering. If a CH_UPDATE command is sent after a REBALANCE_START command but arrives before it, LocalTopologyManagerImpl just needs to act as if the CH_UPDATE command was actually a REBALANCE_START. (It knows there should be a rebalance when a CH_UPDATE command has pendingCH != null.)

              dberinde@redhat.com Dan Berindei (Inactive)
              dberinde@redhat.com Dan Berindei (Inactive)
              Archiver:
              rhn-support-adongare Amol Dongare

                Created:
                Updated:
                Resolved:
                Archived: