Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-1317

Concurrent state transfer requests can lead to premature flush wait closures

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Critical Critical
    • None
    • 5.0.0.FINAL
    • State Transfer
    • None
    • Workaround Exists
    • Hide

      Use a cluster cache loader instead of state transfer.

      In case of Hot Rod server, set infinispan.server.topology.state_transfer to false.

      Show
      Use a cluster cache loader instead of state transfer. In case of Hot Rod server, set infinispan.server.topology.state_transfer to false.

      Logs in JBPAPP-6929 show:

      15:40:22,698 ERROR [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (STREAMING_STATE_TRANSFER-sender-1,default,michal-linhard-12702)
       ISPN000095: Caught while responding to state transfer request: org.infinispan.statetransfer.StateTransferException: 
      java.util.concurrent.TimeoutException: Timed out waiting for a cluster-wide sync to be acquired. (timeout = 60 seconds)
      	at org.infinispan.statetransfer.StateTransferManagerImpl.generateState(StateTransferManagerImpl.java:162) [infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
      	at org.infinispan.remoting.InboundInvocationHandlerImpl.generateState(InboundInvocationHandlerImpl.java:248) [infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
      	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.getState(JGroupsTransport.java:590) [infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
      	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:690) [jgroups-2.12.1.Final.jar:]
      	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:771) [jgroups-2.12.1.Final.jar:]
      	at org.jgroups.JChannel.up(JChannel.java:1484) [jgroups-2.12.1.Final.jar:]
      	at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1074) [jgroups-2.12.1.Final.jar:]
      	at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:477) [jgroups-2.12.1.Final.jar:]
      	at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER$StateProviderHandler.process(STREAMING_STATE_TRANSFER.java:651) [jgroups-2.12.1.Final.jar:]
      	at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER$StateProviderThreadSpawner$1.run(STREAMING_STATE_TRANSFER.java:580) [jgroups-2.12.1.Final.jar:]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [:1.6.0_25]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [:1.6.0_25]
      	at java.lang.Thread.run(Thread.java:662) [:1.6.0_25]
      Caused by: java.util.concurrent.TimeoutException: Timed out waiting for a cluster-wide sync to be acquired. (timeout = 60 seconds)
      	at org.infinispan.remoting.transport.jgroups.JGroupsDistSync.blockUntilAcquired(JGroupsDistSync.java:62) [infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
      	at org.infinispan.statetransfer.StateTransferManagerImpl.generateTransactionLog(StateTransferManagerImpl.java:196) [infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
      	at org.infinispan.statetransfer.StateTransferManagerImpl.generateState(StateTransferManagerImpl.java:152) [infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
      	... 12 more

      Now, what's odd about this is that the JGroupsDistSync.flushWaitGate behind it is only acquired/released while state transfer control command is sent and the logs show that both the enabling(acquiring) and disabling(releasing) state transfer control commands where built:

      15:39:20,902 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-4) 
      dests=[michal-linhard-37465], command=StateTransferControlCommand{enabled=true}, mode=SYNCHRONOUS, timeout=480000
      ...
      15:39:21,074 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-4) 
      dests=[michal-linhard-37465], command=StateTransferControlCommand{enabled=false}, mode=SYNCHRONOUS, timeout=480000

      There's no other references to StateTransferControlCommand, so how can it be that flushWaitGate is open?

              rh-ee-galder Galder Zamarreño
              rh-ee-galder Galder Zamarreño
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: