Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-2836

org.jgroups.TimeoutException after invoking MapCombineCommand in Map/Reduce task with 2 nodes

    Details

    • Affects:
      Documentation (Ref Guide, User Guide, etc.)
    • Workaround:
      Workaround Exists
    • Workaround Description:
      Hide

      Pedro is adding the ability to set a timeout on the MapReduceTask object in Infinispan 5.3. In previous versions of Infinispan, the timeout can be increased using the Sync.replTimeout value in the cache configuration.

      Show
      Pedro is adding the ability to set a timeout on the MapReduceTask object in Infinispan 5.3. In previous versions of Infinispan, the timeout can be increased using the Sync.replTimeout value in the cache configuration.

      Description

      Using RadarGun and two nodes to execute the example WordCount Map/Reduce job against a cache with ~550 keys with a value size of 1MB is producing a thread deadlock. The cache is distributed with transactions disabled.

      TCP transport deadlocks without throwing an exception. Disabling the send queue and setting UNICAST2.conn_expiry_timeout=0 prevents the deadlock, but the job does not complete. The nodes send "are-you-alive" messages back and forth, and I have seen the following exception:

      11:44:29,970 ERROR [org.jgroups.protocols.TCP] (OOB-98,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (76 bytes): java.net.SocketException: Socket closed, cause: null
              at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:352)
              at org.radargun.cachewrappers.InfinispanMapReduceWrapper.executeMapReduceTask(InfinispanMapReduceWrapper.java:98)
              at org.radargun.stages.MapReduceStage.executeOnSlave(MapReduceStage.java:74)
              at org.radargun.Slave$2.run(Slave.java:103)
              at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:439)
              at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
              at java.util.concurrent.FutureTask.run(FutureTask.java:138)
              at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
              at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
              at java.lang.Thread.run(Thread.java:662)
      Caused by: java.util.concurrent.ExecutionException: org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
              at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
              at java.util.concurrent.FutureTask.get(FutureTask.java:83)
              at org.infinispan.distexec.mapreduce.MapReduceTask$TaskPart.get(MapReduceTask.java:832)
              at org.infinispan.distexec.mapreduce.MapReduceTask.executeMapPhaseWithLocalReduction(MapReduceTask.java:477)
              at org.infinispan.distexec.mapreduce.MapReduceTask.execute(MapReduceTask.java:350)
              ... 9 more
      Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
              at org.infinispan.util.Util.rewrapAsCacheException(Util.java:541)
              at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:186)
              at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:515)
      11:44:29,978 ERROR [org.jgroups.protocols.TCP] (Timer-3,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (60 bytes): java.net.SocketException: Socket closed, cause: null
              at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:175)
              at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:197)
              at org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:254)
              at org.infinispan.remoting.rpc.RpcManagerImpl.access$000(RpcManagerImpl.java:80)
              at org.infinispan.remoting.rpc.RpcManagerImpl$1.call(RpcManagerImpl.java:288)
              ... 5 more
      Caused by: org.jgroups.TimeoutException: timeout sending message to edg-perf02-32536
              at org.jgroups.blocks.MessageDispatcher.sendMessage(MessageDispatcher.java:390)
              at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processSingleCall(CommandAwareRpcDispatcher.java:301)
      11:44:29,979 ERROR [org.jgroups.protocols.TCP] (Timer-4,default,edg-perf01-1907) failed sending message to edg-perf02-32536 (63 bytes): java.net.SocketException: Socket closed, cause: null
              at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommand(CommandAwareRpcDispatcher.java:179)
              ... 11 more
      

      With UDP transport, both threads are deadlocked. I will attach thread dumps from runs using TCP and UDP transport.

        Gliffy Diagrams

          Attachments

          1. afield-tcp-521-final.txt
            279 kB
          2. benchmark-mapreduce-multifilesize.xml
            7 kB
          3. dist-udp-no-tx.xml
            2 kB
          4. jgroups-udp.xml
            4 kB
          5. udp-edg-perf01.txt
            119 kB
          6. udp-edg-perf02.txt
            112 kB

            Issue Links

              Activity

                People

                • Assignee:
                  pruivo Pedro Ruivo
                  Reporter:
                  afield Alan Field
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  6 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: