Description
Infinispan 5.2.0.Beta5
JGroups 3.2.4.Final
Steps to reproduce (I'm using two virtual interfaces test1, test2)
1. Start org.jboss.qa.jdg.Test with -Djgroups.udp.bind_addr=test1 -Djava.net.preferIPv4Stack=true
2. wait 10 sec
3. Start org.jboss.qa.jdg.Test with -Djgroups.udp.bind_addr=test2 -Djava.net.preferIPv4Stack=true
After 5 seconds there should be this timeout exception:
19:42:14,146 WARN [org.infinispan.topology.CacheTopologyControlCommand] (OOB-2,mlinhard-work-37329) ISPN000071: Caught exception when handling command CacheTopologyControlCommand{cache=___defaultcache, type=REBALANCE_CONFIRM, sender=mlinhard-work-47337, joinInfo=null, topologyId=1, currentCH=null, pendingCH=null, throwable=null, viewId=1} java.util.concurrent.ExecutionException: org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:232) at java.util.concurrent.FutureTask.get(FutureTask.java:91) at org.infinispan.topology.ClusterTopologyManagerImpl.executeOnClusterSync(ClusterTopologyManagerImpl.java:563) at org.infinispan.topology.ClusterTopologyManagerImpl.broadcastConsistentHashUpdate(ClusterTopologyManagerImpl.java:349) at org.infinispan.topology.ClusterTopologyManagerImpl.handleRebalanceCompleted(ClusterTopologyManagerImpl.java:213) at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:160) at org.infinispan.topology.CacheTopologyControlCommand.perform(CacheTopologyControlCommand.java:137) at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.executeCommandFromLocalCluster(CommandAwareRpcDispatcher.java:252) at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.handle(CommandAwareRpcDispatcher.java:219) at org.jgroups.blocks.RequestCorrelator.handleRequest(RequestCorrelator.java:483) at org.jgroups.blocks.RequestCorrelator.receiveMessage(RequestCorrelator.java:390) at org.jgroups.blocks.RequestCorrelator.receive(RequestCorrelator.java:248) at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:598) at org.jgroups.JChannel.up(JChannel.java:703) at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1020) at org.jgroups.protocols.RSVP.up(RSVP.java:172) at org.jgroups.protocols.FRAG2.up(FRAG2.java:181) at org.jgroups.protocols.FlowControl.up(FlowControl.java:418) at org.jgroups.protocols.FlowControl.up(FlowControl.java:400) at org.jgroups.protocols.pbcast.GMS.up(GMS.java:896) at org.jgroups.protocols.pbcast.STABLE.up(STABLE.java:244) at org.jgroups.protocols.UNICAST2.handleDataReceived(UNICAST2.java:736) at org.jgroups.protocols.UNICAST2.up(UNICAST2.java:414) at org.jgroups.protocols.pbcast.NAKACK2.up(NAKACK2.java:606) at org.jgroups.protocols.VERIFY_SUSPECT.up(VERIFY_SUSPECT.java:143) at org.jgroups.protocols.FD_ALL.up(FD_ALL.java:187) at org.jgroups.protocols.FD_SOCK.up(FD_SOCK.java:288) at org.jgroups.protocols.MERGE2.up(MERGE2.java:205) at org.jgroups.protocols.Discovery.up(Discovery.java:359) at org.jgroups.protocols.TP.passMessageUp(TP.java:1287) at org.jgroups.protocols.TP$IncomingPacket.handleMyMessage(TP.java:1850) at org.jgroups.protocols.TP$IncomingPacket.run(TP.java:1823) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Caused by: org.infinispan.CacheException: org.jgroups.TimeoutException: TimeoutException at org.infinispan.util.Util.rewrapAsCacheException(Util.java:532) at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:152) at org.infinispan.remoting.transport.jgroups.JGroupsTransport.invokeRemotely(JGroupsTransport.java:518) at org.infinispan.topology.ClusterTopologyManagerImpl$2.call(ClusterTopologyManagerImpl.java:545) at org.infinispan.topology.ClusterTopologyManagerImpl$2.call(ClusterTopologyManagerImpl.java:542) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) ... 3 more Caused by: org.jgroups.TimeoutException: TimeoutException at org.jgroups.util.Promise._getResultWithTimeout(Promise.java:145) at org.jgroups.util.Promise.getResultWithTimeout(Promise.java:40) at org.jgroups.util.AckCollector.waitForAllAcks(AckCollector.java:93) at org.jgroups.protocols.RSVP$Entry.block(RSVP.java:287) at org.jgroups.protocols.RSVP.down(RSVP.java:118) at org.jgroups.stack.ProtocolStack.down(ProtocolStack.java:1025) at org.jgroups.JChannel.down(JChannel.java:718) at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.down(MessageDispatcher.java:616) at org.jgroups.blocks.RequestCorrelator.sendRequest(RequestCorrelator.java:173) at org.jgroups.blocks.GroupRequest.sendRequest(GroupRequest.java:360) at org.jgroups.blocks.GroupRequest.sendRequest(GroupRequest.java:103) at org.jgroups.blocks.Request.execute(Request.java:83) at org.jgroups.blocks.MessageDispatcher.cast(MessageDispatcher.java:335) at org.jgroups.blocks.MessageDispatcher.castMessage(MessageDispatcher.java:249) at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.processCalls(CommandAwareRpcDispatcher.java:330) at org.infinispan.remoting.transport.jgroups.CommandAwareRpcDispatcher.invokeRemoteCommands(CommandAwareRpcDispatcher.java:145) ... 8 more
Analysis:
These are the messages sent after view change:
test1 test2 <--- JOIN ---- ---- REBALANCE_START ---> <--- StateRequestCommand ---- ---- StateResponseCommand ---> <--- REBALANCE_CONFIRM ---- ---- CH_UPDATE --->
The last CH_UPDATE message is broadcast, test2 successfully processes it, but test1 stays in waiting state, because it for some reason awaits response also from itself - local variable entry in the method RSVP.down
(https://github.com/belaban/JGroups/blob/master/src/org/jgroups/protocols/RSVP.java#L121)
contained local address.