Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-2836

org.jgroups.TimeoutException after invoking MapCombineCommand in Map/Reduce task with 2 nodes

XMLWordPrintable

    • Documentation (Ref Guide, User Guide, etc.)
    • Workaround Exists
    • Hide

      Pedro is adding the ability to set a timeout on the MapReduceTask object in Infinispan 5.3. In previous versions of Infinispan, the timeout can be increased using the Sync.replTimeout value in the cache configuration.

      Show
      Pedro is adding the ability to set a timeout on the MapReduceTask object in Infinispan 5.3. In previous versions of Infinispan, the timeout can be increased using the Sync.replTimeout value in the cache configuration.

      Using RadarGun and two nodes to execute the example WordCount Map/Reduce job against a cache with ~550 keys with a value size of 1MB is producing a thread deadlock. The cache is distributed with transactions disabled.

      TCP transport deadlocks without throwing an exception. Disabling the send queue and setting UNICAST2.conn_expiry_timeout=0 prevents the deadlock, but the job does not complete. The nodes send "are-you-alive" messages back and forth, and I have seen the following exception:

      11:44:29,970 ERROR [org.jgroups.protocols.TCP] (OOB-98,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (76 bytes): java.net.SocketException: Socket closed, cause: null
              at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:352)
              at org.radargun.cachewrappers.InfinispanMapReduceWrapper.executeMapReduceTask(InfinispanMapReduceWrapper.java:98)
              at org.radargun.stages.MapReduceStage.executeOnSlave(MapReduceStage.java:74)
              at org.radargun.Slave$2.run(Slave.java:103)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
              at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
              at java.util.concurrent.FutureTask.run(FutureTask.java:138)
              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
              at java.lang.Thread.run(Thread.java:662)
      Caused by: java.util.concurrent.ExecutionException: org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
              at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
              at java.util.concurrent.FutureTask.get(FutureTask.java:83)
              at org.infinispan.distexec.mapreduce.MapReduceTask$TaskPart.get(MapReduceTask.java:832)
              at org.infinispan.distexec.mapreduce.MapReduceTask.executeMapPhaseWithLocalReduction(MapReduceTask.java:477)
              at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:350)
              ... 9 more
      Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
              at org.infinispan.util.Util.rewrapAsCacheException(Util.java:541)
              at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:186)
              at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:515)
      11:44:29,978 ERROR [org.jgroups.protocols.TCP] (Timer-3,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (60 bytes): java.net.SocketException: Socket closed, cause: null
              at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:175)
              at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:197)
              at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:254)
              at org.infinispan.remoting.rpc.RpcManagerImpl.access$000(RpcManagerImpl.java:80)
              at org.infinispan.remoting.rpc.RpcManagerImpl$1.call(RpcManagerImpl.java:288)
              ... 5 more
      Caused by: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
              at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:390)
              at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:301)
      11:44:29,979 ERROR [org.jgroups.protocols.TCP] (Timer-4,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (63 bytes): java.net.SocketException: Socket closed, cause: null
              at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:179)
              ... 11 more
      

      With UDP transport, both threads are deadlocked. I will attach thread dumps from runs using TCP and UDP transport.

        1. afield-tcp-521-final.txt
          279 kB
        2. udp-edg-perf01.txt
          119 kB
        3. udp-edg-perf02.txt
          112 kB
        4. benchmark-mapreduce-multifilesize.xml
          7 kB
        5. dist-udp-no-tx.xml
          2 kB
        6. jgroups-udp.xml
          4 kB

              pruivo@redhat.com Pedro Ruivo
              rhn-support-afield Alan Field
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: