Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-1872

Coordinator hangs when cache is loaded to it and l1cache enabled in cluster

    XMLWordPrintable

Details

    • Bug
    • Status: Closed (View Workflow)
    • Blocker
    • Resolution: Done
    • 5.1.1.FINAL
    • 5.1.2.FINAL
    • Core
    • None
    • Hide

      Download source from trunk and edit InfinispanDemo.form to allow 1000000 cache entries to be generated:

      <component id="7a363" class="javax.swing.JSlider" binding="generateSlider">
      <constraints>
      <grid row="14" column="2" row-span="1" col-span="1" vsize-policy="0" hsize-policy="6" anchor="8" fill="1" indent="0" use-parent-layout="false"/>
      <forms defaultalign-horz="false"/>
      </constraints>
      <properties>
      <maximum value="1000000"/>
      <minimum value="1"/>
      <minorTickSpacing value="100"/>
      <paintLabels value="true"/>
      <paintTicks value="true"/>
      <valueIsAdjusting value="true"/>
      </properties>
      </component>

      Build distribution, and launch the runGuiDemo.sh 4 times to create 4 separate nodes. (I'm running i7 quad core with 8G on rhel 6.2). Start the cache instance on each node and verify they have all joined the cluster. Go to the coordinator node, and generate 1,000,000 cache entries. After the TimeoutException is thrown, the coordinator node is hung. Kill one of the other nodes and the coordinator will become responsive, but with only have the cache entries of the remaining two nodes.

      As a workaround, alter the cache so that l1 enabled = "false". Also, generate the data on a node that is not the coordinator.

      Show
      Download source from trunk and edit InfinispanDemo.form to allow 1000000 cache entries to be generated: <component id="7a363" class="javax.swing.JSlider" binding="generateSlider"> <constraints> <grid row="14" column="2" row-span="1" col-span="1" vsize-policy="0" hsize-policy="6" anchor="8" fill="1" indent="0" use-parent-layout="false"/> <forms defaultalign-horz="false"/> </constraints> <properties> <maximum value="1000000"/> <minimum value="1"/> <minorTickSpacing value="100"/> <paintLabels value="true"/> <paintTicks value="true"/> <valueIsAdjusting value="true"/> </properties> </component> Build distribution, and launch the runGuiDemo.sh 4 times to create 4 separate nodes. (I'm running i7 quad core with 8G on rhel 6.2). Start the cache instance on each node and verify they have all joined the cluster. Go to the coordinator node, and generate 1,000,000 cache entries. After the TimeoutException is thrown, the coordinator node is hung. Kill one of the other nodes and the coordinator will become responsive, but with only have the cache entries of the remaining two nodes. As a workaround, alter the cache so that l1 enabled = "false". Also, generate the data on a node that is not the coordinator.
    • Workaround Exists
    • Hide

      The workarounds are to make sure cache isn't loaded to the coordinator node or disable l1 cache.

      Show
      The workarounds are to make sure cache isn't loaded to the coordinator node or disable l1 cache.

    Description

      Scaled from 3 nodes to 4 nodes and ran into this issue with both 5.1.1 and trunk (5.2.0 snapshot from 2.18.12).

      I altered the slider in the gui demo to allow for 1,000,000 cache entries. If I generate the cache on the coordinator node, and the following exception occurs :

      2012-02-15 12:40:49,633 ERROR [InvocationContextInterceptor]
      (pool-1-thread-1) ISPN000136: Execution error
      org.infinispan.util.concurrent.TimeoutException: Replication timeout for
      muskrat-626
      at
      org.infinispan.remoting.transport.AbstractTransport.parseResponseAndAddToResponseList(AbstractTransport.java:99)
      at
      org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:461)
      at
      org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:148)
      at
      org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:169)
      at
      org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:219)
      at
      org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:206)
      at
      org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:201)
      at
      org.infinispan.remoting.rpc.RpcManagerImpl.invokeRemotely(RpcManagerImpl.java:197)
      at
      org.infinispan.interceptors.DistributionInterceptor.handleWriteCommand(DistributionInterceptor.java:494)
      at
      org.infinispan.interceptors.DistributionInterceptor.visitPutMapCommand(DistributionInterceptor.java:285)
      at
      org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:66)
      at
      org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
      at
      org.infinispan.interceptors.EntryWrappingInterceptor.invokeNextAndApplyChanges(EntryWrappingInterceptor.java:199)
      at
      org.infinispan.interceptors.EntryWrappingInterceptor.visitPutMapCommand(EntryWrappingInterceptor.java:160)
      at
      org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:66)
      at
      org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
      at
      org.infinispan.interceptors.locking.NonTransactionalLockingInterceptor.visitPutMapCommand(NonTransactionalLockingInterceptor.java:84)
      at
      org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:66)
      at
      org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
      at
      org.infinispan.interceptors.base.CommandInterceptor.handleDefault(CommandInterceptor.java:130)
      at
      org.infinispan.commands.AbstractVisitor.visitPutMapCommand(AbstractVisitor.java:77)
      at
      org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:66)
      at
      org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
      at
      org.infinispan.interceptors.StateTransferLockInterceptor.handleWithRetries(StateTransferLockInterceptor.java:207)
      at
      org.infinispan.interceptors.StateTransferLockInterceptor.handleWriteCommand(StateTransferLockInterceptor.java:180)
      at
      org.infinispan.interceptors.StateTransferLockInterceptor.visitPutMapCommand(StateTransferLockInterceptor.java:171)
      at
      org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:66)
      at
      org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
      at
      org.infinispan.interceptors.CacheMgmtInterceptor.visitPutMapCommand(CacheMgmtInterceptor.java:110)
      at
      org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:66)
      at
      org.infinispan.interceptors.base.CommandInterceptor.invokeNextInterceptor(CommandInterceptor.java:116)
      at
      org.infinispan.interceptors.InvocationContextInterceptor.handleAll(InvocationContextInterceptor.java:130)
      at
      org.infinispan.interceptors.InvocationContextInterceptor.handleDefault(InvocationContextInterceptor.java:89)
      at
      org.infinispan.commands.AbstractVisitor.visitPutMapCommand(AbstractVisitor.java:77)
      at
      org.infinispan.commands.write.PutMapCommand.acceptVisitor(PutMapCommand.java:66)
      at
      org.infinispan.interceptors.InterceptorChain.invoke(InterceptorChain.java:345)
      at
      org.infinispan.CacheImpl.executeCommandAndCommitIfNeeded(CacheImpl.java:941)
      at org.infinispan.CacheImpl.putAll(CacheImpl.java:678)
      at org.infinispan.CacheImpl.putAll(CacheImpl.java:671)
      at org.infinispan.CacheSupport.putAll(CacheSupport.java:66)
      at
      org.infinispan.demo.InfinispanDemo$7$1.run(InfinispanDemo.java:251)
      at
      java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
      at
      java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
      at java.lang.Thread.run(Thread.java:679)

      At that time the gui for the coordinator becomes unresponsive. I attached jconsole to the 4 nodes and forced a system.gc. The coordinator node sits at 62MB heap after gc, while the other 3 nodes are sitting around 280MB. The cache distribution has not succeeded on this node. If I kill one of the other nodes, the coordinator instantly becomes responsive. In the final state the coordinator will end up with 1/5 of the load, while the other 2 nodes are each holding about 2/5 of the load.

      The problem only occurs when l1cache is enabled, or I generate the data on the coordinator node. It also only becomes a problem when I scale from 3-4 nodes.

      Here is the original cache configuration for all 3 nodes :

      <infinispan
      xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
      xsi:schemaLocation="urn:infinispan:config:5.2 http://www.infinispan.org/schemas/infinispan-config-5.2.xsd"
      xmlns="urn:infinispan:config:5.2">

      <global>
      <transport clusterName="demoCluster"/>
      <globalJmxStatistics enabled="true"/>
      </global>

      <default>
      <jmxStatistics enabled="true"/>
      <clustering mode="distribution">
      <l1 enabled="true" lifespan="60000"/>
      <hash numOwners="2" rehashRpcTimeout="120000"/>
      <sync/>
      </clustering>
      </default>
      </infinispan>

      Attachments

        Issue Links

          Activity

            People

              dberinde@redhat.com Dan Berindei
              rhn-support-mattd Matt Davis (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: