Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-2713

REBALANCE_START and REBALANCE_CONFIRM commands deadlock when RSVP.ack_on_delivery=true

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XMLWordPrintable

    • Workaround Exists
    • Hide

      Set RSVP.ack_on_delivery=false in the JGroups configuration (it's already configured that way in the JGroups configurations that ship with Infinispan).

      Show
      Set RSVP.ack_on_delivery=false in the JGroups configuration (it's already configured that way in the JGroups configurations that ship with Infinispan).

      When the coordinator sends a REBALANCE_START command, it holds a lock on the ClusterCacheStatus until it receives the responses from all the other members.

      If a node doesn't need to request any new state, it sends the rebalance confirmation to the coordinator on the same thread that received the REBALANCE_START command. The REBALANCE_CONFIRM command also wants to acquire a lock on the ClusterCacheStatus on the coordinator, but because the REBALANCE_CONFIRM command is sent asynchronously, it doesn't deadlock with the thread waiting for REBALANCE_START responses on the coordinator.

      At least, that's what happens when RSVP.ack_on_delivery=false (the Infinispan default). When RSVP.ack_on_delivery=true (the JGroups default), the "asynchronous" REBALANCE_CONFIRM command becomes synchronous, and it generates a deadlock. The rebalance then fails after the RSVP timeout expires (10 seconds by default).

              dberinde@redhat.com Dan Berindei (Inactive)
              dberinde@redhat.com Dan Berindei (Inactive)
              Archiver:
              rhn-support-adongare Amol Dongare

                Created:
                Updated:
                Resolved:
                Archived: