Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-2836

org.jgroups.TimeoutException after invoking MapCombineCommand in Map/Reduce task with 2 nodes

    XMLWordPrintable

    Details

    • Affects:
      Documentation (Ref Guide, User Guide, etc.)
    • Workaround:
      Workaround Exists
    • Workaround Description:
      Hide

      Pedro is adding the ability to set a timeout on the MapReduceTask object in Infinispan 5.3. In previous versions of Infinispan, the timeout can be increased using the Sync.replTimeout value in the cache configuration.

      Show
      Pedro is adding the ability to set a timeout on the MapReduceTask object in Infinispan 5.3. In previous versions of Infinispan, the timeout can be increased using the Sync.replTimeout value in the cache configuration.

      Description

      Using RadarGun and two nodes to execute the example WordCount Map/Reduce job against a cache with ~550 keys with a value size of 1MB is producing a thread deadlock. The cache is distributed with transactions disabled.

      TCP transport deadlocks without throwing an exception. Disabling the send queue and setting UNICAST2.conn_expiry_timeout=0 prevents the deadlock, but the job does not complete. The nodes send "are-you-alive" messages back and forth, and I have seen the following exception:

      11:44:29,970 ERROR [org.jgroups.protocols.TCP] (OOB-98,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (76 bytes): java.net.SocketException: Socket closed, cause: null
              at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:352)
              at org.radargun.cachewrappers.InfinispanMapReduceWrapper.executeMapReduceTask(InfinispanMapReduceWrapper.java:98)
              at org.radargun.stages.MapReduceStage.executeOnSlave(MapReduceStage.java:74)
              at org.radargun.Slave$2.run(Slave.java:103)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
              at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
              at java.util.concurrent.FutureTask.run(FutureTask.java:138)
              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
              at java.lang.Thread.run(Thread.java:662)
      Caused by: java.util.concurrent.ExecutionException: org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
              at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
              at java.util.concurrent.FutureTask.get(FutureTask.java:83)
              at org.infinispan.distexec.mapreduce.MapReduceTask$TaskPart.get(MapReduceTask.java:832)
              at org.infinispan.distexec.mapreduce.MapReduceTask.executeMapPhaseWithLocalReduction(MapReduceTask.java:477)
              at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:350)
              ... 9 more
      Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
              at org.infinispan.util.Util.rewrapAsCacheException(Util.java:541)
              at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:186)
              at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:515)
      11:44:29,978 ERROR [org.jgroups.protocols.TCP] (Timer-3,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (60 bytes): java.net.SocketException: Socket closed, cause: null
              at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:175)
              at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:197)
              at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:254)
              at org.infinispan.remoting.rpc.RpcManagerImpl.access$000(RpcManagerImpl.java:80)
              at org.infinispan.remoting.rpc.RpcManagerImpl$1.call(RpcManagerImpl.java:288)
              ... 5 more
      Caused by: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
              at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:390)
              at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:301)
      11:44:29,979 ERROR [org.jgroups.protocols.TCP] (Timer-4,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (63 bytes): java.net.SocketException: Socket closed, cause: null
              at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:179)
              ... 11 more
      

      With UDP transport, both threads are deadlocked. I will attach thread dumps from runs using TCP and UDP transport.

        Attachments

        1. afield-tcp-521-final.txt
          279 kB
        2. benchmark-mapreduce-multifilesize.xml
          7 kB
        3. dist-udp-no-tx.xml
          2 kB
        4. jgroups-udp.xml
          4 kB
        5. udp-edg-perf01.txt
          119 kB
        6. udp-edg-perf02.txt
          112 kB

          Issue Links

            Activity

              People

              Assignee:
              pruivo Pedro Ruivo
              Reporter:
              afield Alan Field
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Dates

                Created:
                Updated:
                Resolved: