Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-7064

RPC to leaver times out instead of finishing immediately

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XMLWordPrintable

      This causes random failures in ClusterTopologyManagerTest.testAbruptLeaveAfterGetStatus2. Node D leaves and nodes E and F start a rebalance. Then node F also leaves, but DISCARD is enabled first, so the CacheTopologyControlCommand(LEAVE) synchronous RPC blocks until node F installs a view by itself.

      14:05:17,187 INFO  (Incoming-1,Test-NodeF-12732:[]) [JGroupsTransport] ISPN000094: Received new cluster view for channel ISPN: [Test-NodeE-18953|3] (2) [Test-NodeE-18953, Test-NodeF-12732]
      14:05:17,261 TRACE (testng-Test:[]) [JGroupsTransport] dests=[Test-NodeE-18953], command=CacheTopologyControlCommand{cache=testCache, type=LEAVE, sender=Test-NodeF-12732, joinInfo=null, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, actualMembers=null, throwable=null, viewId=3}, mode=SYNCHRONOUS, timeout=240000
      14:05:22,546 DEBUG (Thread-88606:[]) [GMS] Test-NodeF-12732: installing view [Test-NodeF-12732|4] (1) [Test-NodeF-12732]
      14:05:22,589 DEBUG (testng-Test:[]) [LocalTopologyManagerImpl] Error sending the leave request for cache testCache to coordinator
      

      Node F then cancels its inbound state transfers by sending a synchronous StateRequestCommand(CANCEL_STATE_TRANSFER) RPC to node F. This RPC sometimes hits JGRP-2103 and it only times out after 5 minutes, causing the test to fail.

      14:05:22,608 TRACE (testng-Test:[]) [JGroupsTransport] dests=[Test-NodeE-18953], command=StateRequestCommand{cache=testCache, origin=Test-NodeF-12732, type=CANCEL_STATE_TRANSFER, topologyId=7, segments=[130, 3, ...]}, mode=SYNCHRONOUS_IGNORE_LEAVERS, timeout=240000
      14:05:22,608 TRACE (testng-Test:[]) [RequestCorrelator] Test-NodeF-12732: invoking multicast RPC [req-id=134161]
      14:05:22,609 DEBUG (Thread-88606:[]) [JGroupsTransport] New view accepted: [Test-NodeF-12732|4] (1) [Test-NodeF-12732]
      14:09:22,608 TRACE (timeout-thread-Test-NodeF-p48767-t1:[]) [JGroupsTransport] Responses: Responses{
      Test-NodeE-18953: sender=Test-NodeE-18953, received=false, suspected=true}
      14:09:22,608 TRACE (timeout-thread-Test-NodeF-p48767-t1:[]) [RpcManagerImpl] Replication exception
      org.infinispan.util.concurrent.TimeoutException: Replication timeout
      	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$2(JGroupsTransport.java:659) ~[classes/:?]
      	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) ~[?:1.8.0_101]
      	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) ~[?:1.8.0_101]
      	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) [?:1.8.0_101]
      	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) [?:1.8.0_101]
      	at org.infinispan.remoting.transport.jgroups.RspListFuture.call(RspListFuture.java:47) [classes/:?]
      	at org.infinispan.remoting.transport.jgroups.RspListFuture.call(RspListFuture.java:15) [classes/:?]
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_101]
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_101]
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_101]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_101]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_101]
      	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]
      	Suppressed: org.infinispan.remoting.RpcException: Test-NodeE-18953 was suspected
      		at org.infinispan.remoting.transport.jgroups.JGroupsTransport.addSuppressedExceptions(JGroupsTransport.java:707) ~[classes/:?]
      		at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$2(JGroupsTransport.java:659) ~[classes/:?]
      		at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) ~[?:1.8.0_101]
      		at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) ~[?:1.8.0_101]
      		at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) [?:1.8.0_101]
      		at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) [?:1.8.0_101]
      		at org.infinispan.remoting.transport.jgroups.RspListFuture.call(RspListFuture.java:47) [classes/:?]
      		at org.infinispan.remoting.transport.jgroups.RspListFuture.call(RspListFuture.java:15) [classes/:?]
      		at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_101]
      		at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_101]
      		at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_101]
      		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_101]
      		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_101]
      		at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]
      14:09:22,609 DEBUG (testng-Test:[]) [InboundTransferTask] Caught an exception while cancelling state transfer for segments [130, 3, ...] from Test-NodeE-18953
      

              dberinde@redhat.com Dan Berindei (Inactive)
              dberinde@redhat.com Dan Berindei (Inactive)
              Archiver:
              rhn-support-adongare Amol Dongare

                Created:
                Updated:
                Resolved:
                Archived: