Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-2778

When a cache is restarted, the LEAVE and JOIN commands are not ordered

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Major
    • 5.2.0.Final
    • 5.2.0.CR3
    • State Transfer
    • None

    Description

      The LEAVE command is sent asynchronously, so if the cache is restarted it is possible for the new JOIN command to be processed before the LEAVE command on the coordinator.

      This doesn't work out very well: as the joining node is already present in the consistent hash during join, it won't do any state transfer. After that, it will receive a topology update with itself removed from the consistent hash.

      I have seen one failure because of this in StateTransferFunctionalTest.testInitialStateTransferAfterRestart:

      03:25:36,749 TRACE (testng-StateTransferFunctionalTest:) [JGroupsTransport] dests=[NodeG-42396], command=CacheTopologyControlCommand{cache=nbst, type=LEAVE, sender=NodeH-44562, joinInfo=null, topologyId=0, currentCH=null, pendingCH=null, throwable=null, viewId=1}, mode=ASYNCHRONOUS_WITH_SYNC_MARSHALLING, timeout=0
      03:25:36,770 TRACE (testng-StateTransferFunctionalTest:) [JGroupsTransport] dests=[NodeG-42396], command=CacheTopologyControlCommand{cache=nbst, type=JOIN, sender=NodeH-44562, joinInfo=CacheJoinInfo{consistentHashFactory=org.infinispan.distribution.ch.ReplicatedConsistentHashFactory@335703e5, hashFunction=org.infinispan.commons.hash.MurmurHash3@64b6f0a5, numSegments=60, numOwners=2, timeout=240000}, topologyId=0, currentCH=null, pendingCH=null, throwable=null, viewId=1}, mode=SYNCHRONOUS, timeout=240000
      03:25:36,771 TRACE (OOB-1,ISPN,NodeG-42396:) [CommandAwareRpcDispatcher] Attempting to execute non-CacheRpcCommand command: CacheTopologyControlCommand{cache=nbst, type=JOIN, sender=NodeH-44562, joinInfo=CacheJoinInfo{consistentHashFactory=org.infinispan.distribution.ch.ReplicatedConsistentHashFactory@3aea6b42, hashFunction=org.infinispan.commons.hash.MurmurHash3@7427d845, numSegments=60, numOwners=2, timeout=240000}, topologyId=0, currentCH=null, pendingCH=null, throwable=null, viewId=1} [sender=NodeH-44562]
      03:25:36,771 TRACE (testng-StateTransferFunctionalTest:) [StateTransferManagerImpl] Installing new cache topology CacheTopology{id=2, currentCH=ReplicatedConsistentHash{members=[NodeG-42396, NodeH-44562]}, pendingCH=null} on cache nbst
      03:25:36,782 TRACE (OOB-2,ISPN,NodeG-42396:) [CommandAwareRpcDispatcher] Attempting to execute non-CacheRpcCommand command: CacheTopologyControlCommand{cache=nbst, type=LEAVE, sender=NodeH-44562, joinInfo=null, topologyId=0, currentCH=null, pendingCH=null, throwable=null, viewId=1} [sender=NodeH-44562]
      03:25:36,840 TRACE (OOB-2,ISPN,NodeG-42396:nbst nbst) [StateTransferManagerImpl] Installing new cache topology CacheTopology{id=3, currentCH=ReplicatedConsistentHash{members=[NodeG-42396]}, pendingCH=null} on cache nbst
      03:25:36,852 TRACE (OOB-2,ISPN,NodeH-44562:nbst) [StateTransferManagerImpl] Installing new cache topology CacheTopology{id=3, currentCH=ReplicatedConsistentHash{members=[NodeG-42396]}, pendingCH=null} on cache nbst
      

      The solution is be to make the LEAVE command synchronous.

      Attachments

        Activity

          People

            dberinde@redhat.com Dan Berindei (Inactive)
            dberinde@redhat.com Dan Berindei (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: