Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-9517

State transfer times out if initiated with yet to be verified suspected member and reincarnated member

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 9.4.0.Final
    • 9.3.3.Final
    • State Transfer
    • None

      Here's the scenario:
      1. Cluster contains caches on 2 members, node-1 and node-2
      2. node-2 is killed
      3. node-2 is restarted (using same physical address)
      4. State transfer initiates, view contains node-1, suspected node-2, and reincarnated node-2
      5. State transfer times out

      Log of node-1 includes:

      12:09:51,882 WARN  [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p14-t4) ISPN000197: Error updating cluster member list: org.infinispan.util.concurrent.TimeoutException: ISPN000476: Timed out waiting for responses for request 3 from node-2
      	at org.infinispan.remoting.transport.impl.MultiTargetRequest.onTimeout(MultiTargetRequest.java:167)
      	at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:87)
      	at org.infinispan.remoting.transport.AbstractRequest.call(AbstractRequest.java:22)
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [rt.jar:1.8.0_181]
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [rt.jar:1.8.0_181]
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [rt.jar:1.8.0_181]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_181]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_181]
      	at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_181]
      	Suppressed: org.infinispan.util.logging.TraceException
      		at org.infinispan.remoting.transport.Transport.invokeRemotely(Transport.java:75)
      		at org.infinispan.topology.ClusterTopologyManagerImpl.confirmMembersAvailable(ClusterTopologyManagerImpl.java:525)
      		at org.infinispan.topology.ClusterTopologyManagerImpl.updateCacheMembers(ClusterTopologyManagerImpl.java:508)
      		at org.infinispan.topology.ClusterTopologyManagerImpl.handleClusterView(ClusterTopologyManagerImpl.java:321)
      		at org.infinispan.topology.ClusterTopologyManagerImpl.access$500(ClusterTopologyManagerImpl.java:87)
      		at org.infinispan.topology.ClusterTopologyManagerImpl$ClusterViewListener.lambda$handleViewChange$0(ClusterTopologyManagerImpl.java:731)
      		at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175)
      		at org.infinispan.executors.LimitedExecutor.access$100(LimitedExecutor.java:37)
      		at org.infinispan.executors.LimitedExecutor$Runner.run(LimitedExecutor.java:227)
      		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [rt.jar:1.8.0_181]
      		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [rt.jar:1.8.0_181]
      		at org.wildfly.clustering.service.concurrent.ClassLoaderThreadFactory.lambda$newThread$0(ClassLoaderThreadFactory.java:47)
      		... 1 more
      

      I've attached trace logs from node-1 and node-2.

      Changing ClusterTopologyManagerImpl.confirmMembersAvailable() to use ResponseMode.SYNCHRONOUS_IGNORE_LEAVERS instead of ResponseMode.SYNCHRONOUS allows state transfer to complete successfully.

        1. log.rtf
          133 kB
        2. node-1.zip
          8.73 MB
        3. node-2.zip
          7.44 MB
        4. Test.java
          12 kB

              dberinde@redhat.com Dan Berindei (Inactive)
              pferraro@redhat.com Paul Ferraro
              Archiver:
              rhn-support-adongare Amol Dongare

                Created:
                Updated:
                Resolved:
                Archived: