Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-7064

RPC to leaver times out instead of finishing immediately

    XMLWordPrintable

Details

    Description

      This causes random failures in ClusterTopologyManagerTest.testAbruptLeaveAfterGetStatus2. Node D leaves and nodes E and F start a rebalance. Then node F also leaves, but DISCARD is enabled first, so the CacheTopologyControlCommand(LEAVE) synchronous RPC blocks until node F installs a view by itself.

      14:05:17,187 INFO  (Incoming-1,Test-NodeF-12732:[]) [JGroupsTransport] ISPN000094: Received new cluster view for channel ISPN: [Test-NodeE-18953|3] (2) [Test-NodeE-18953, Test-NodeF-12732]
      14:05:17,261 TRACE (testng-Test:[]) [JGroupsTransport] dests=[Test-NodeE-18953], command=CacheTopologyControlCommand{cache=testCache, type=LEAVE, sender=Test-NodeF-12732, joinInfo=null, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, actualMembers=null, throwable=null, viewId=3}, mode=SYNCHRONOUS, timeout=240000
      14:05:22,546 DEBUG (Thread-88606:[]) [GMS] Test-NodeF-12732: installing view [Test-NodeF-12732|4] (1) [Test-NodeF-12732]
      14:05:22,589 DEBUG (testng-Test:[]) [LocalTopologyManagerImpl] Error sending the leave request for cache testCache to coordinator
      

      Node F then cancels its inbound state transfers by sending a synchronous StateRequestCommand(CANCEL_STATE_TRANSFER) RPC to node F. This RPC sometimes hits JGRP-2103 and it only times out after 5 minutes, causing the test to fail.

      14:05:22,608 TRACE (testng-Test:[]) [JGroupsTransport] dests=[Test-NodeE-18953], command=StateRequestCommand{cache=testCache, origin=Test-NodeF-12732, type=CANCEL_STATE_TRANSFER, topologyId=7, segments=[130, 3, ...]}, mode=SYNCHRONOUS_IGNORE_LEAVERS, timeout=240000
      14:05:22,608 TRACE (testng-Test:[]) [RequestCorrelator] Test-NodeF-12732: invoking multicast RPC [req-id=134161]
      14:05:22,609 DEBUG (Thread-88606:[]) [JGroupsTransport] New view accepted: [Test-NodeF-12732|4] (1) [Test-NodeF-12732]
      14:09:22,608 TRACE (timeout-thread-Test-NodeF-p48767-t1:[]) [JGroupsTransport] Responses: Responses{
      Test-NodeE-18953: sender=Test-NodeE-18953, received=false, suspected=true}
      14:09:22,608 TRACE (timeout-thread-Test-NodeF-p48767-t1:[]) [RpcManagerImpl] Replication exception
      org.infinispan.util.concurrent.TimeoutException: Replication timeout
      	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$2(JGroupsTransport.java:659) ~[classes/:?]
      	at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) ~[?:1.8.0_101]
      	at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) ~[?:1.8.0_101]
      	at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) [?:1.8.0_101]
      	at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) [?:1.8.0_101]
      	at org.infinispan.remoting.transport.jgroups.RspListFuture.call(RspListFuture.java:47) [classes/:?]
      	at org.infinispan.remoting.transport.jgroups.RspListFuture.call(RspListFuture.java:15) [classes/:?]
      	at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_101]
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_101]
      	at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_101]
      	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_101]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_101]
      	at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]
      	Suppressed: org.infinispan.remoting.RpcException: Test-NodeE-18953 was suspected
      		at org.infinispan.remoting.transport.jgroups.JGroupsTransport.addSuppressedExceptions(JGroupsTransport.java:707) ~[classes/:?]
      		at org.infinispan.remoting.transport.jgroups.JGroupsTransport.lambda$invokeRemotelyAsync$2(JGroupsTransport.java:659) ~[classes/:?]
      		at java.util.concurrent.CompletableFuture.uniApply(CompletableFuture.java:602) ~[?:1.8.0_101]
      		at java.util.concurrent.CompletableFuture$UniApply.tryFire(CompletableFuture.java:577) ~[?:1.8.0_101]
      		at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:474) [?:1.8.0_101]
      		at java.util.concurrent.CompletableFuture.complete(CompletableFuture.java:1962) [?:1.8.0_101]
      		at org.infinispan.remoting.transport.jgroups.RspListFuture.call(RspListFuture.java:47) [classes/:?]
      		at org.infinispan.remoting.transport.jgroups.RspListFuture.call(RspListFuture.java:15) [classes/:?]
      		at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_101]
      		at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) [?:1.8.0_101]
      		at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) [?:1.8.0_101]
      		at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_101]
      		at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_101]
      		at java.lang.Thread.run(Thread.java:745) [?:1.8.0_101]
      14:09:22,609 DEBUG (testng-Test:[]) [InboundTransferTask] Caught an exception while cancelling state transfer for segments [130, 3, ...] from Test-NodeE-18953
      

      Attachments

        Issue Links

          Activity

            People

              dberinde@redhat.com Dan Berindei (Inactive)
              dberinde@redhat.com Dan Berindei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: