Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-1317

Concurrent state transfer requests can lead to premature flush wait closures

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Critical Critical
    • None
    • 5.0.0.FINAL
    • State Transfer
    • None
    • Workaround Exists
    • Hide

      Use a cluster cache loader instead of state transfer.

      In case of Hot Rod server, set infinispan.server.topology.state_transfer to false.

      Show
      Use a cluster cache loader instead of state transfer. In case of Hot Rod server, set infinispan.server.topology.state_transfer to false.

      Logs in JBPAPP-6929 show:

      15:40:22,698 ERROR [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (STREAMING_STATE_TRANSFER-sender-1,default,michal-linhard-12702)
       ISPN000095: Caught while responding to state transfer request: org.infinispan.statetransfer.StateTransferException: 
      java.util.concurrent.TimeoutException: Timed out waiting for a cluster-wide sync to be acquired. (timeout = 60 seconds)
      	at org.infinispan.statetransfer.StateTransferManagerImpl.generateState(StateTransferManagerImpl.java:162) [infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
      	at org.infinispan.remoting.InboundInvocationHandlerImpl.generateState(InboundInvocationHandlerImpl.java:248) [infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
      	at org.infinispan.remoting.transport.jgroups.JGroupsTransport.getState(JGroupsTransport.java:590) [infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
      	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.handleUpEvent(MessageDispatcher.java:690) [jgroups-2.12.1.Final.jar:]
      	at org.jgroups.blocks.MessageDispatcher$ProtocolAdapter.up(MessageDispatcher.java:771) [jgroups-2.12.1.Final.jar:]
      	at org.jgroups.JChannel.up(JChannel.java:1484) [jgroups-2.12.1.Final.jar:]
      	at org.jgroups.stack.ProtocolStack.up(ProtocolStack.java:1074) [jgroups-2.12.1.Final.jar:]
      	at org.jgroups.protocols.pbcast.FLUSH.up(FLUSH.java:477) [jgroups-2.12.1.Final.jar:]
      	at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER$StateProviderHandler.process(STREAMING_STATE_TRANSFER.java:651) [jgroups-2.12.1.Final.jar:]
      	at org.jgroups.protocols.pbcast.STREAMING_STATE_TRANSFER$StateProviderThreadSpawner$1.run(STREAMING_STATE_TRANSFER.java:580) [jgroups-2.12.1.Final.jar:]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) [:1.6.0_25]
      	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) [:1.6.0_25]
      	at java.lang.Thread.run(Thread.java:662) [:1.6.0_25]
      Caused by: java.util.concurrent.TimeoutException: Timed out waiting for a cluster-wide sync to be acquired. (timeout = 60 seconds)
      	at org.infinispan.remoting.transport.jgroups.JGroupsDistSync.blockUntilAcquired(JGroupsDistSync.java:62) [infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
      	at org.infinispan.statetransfer.StateTransferManagerImpl.generateTransactionLog(StateTransferManagerImpl.java:196) [infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
      	at org.infinispan.statetransfer.StateTransferManagerImpl.generateState(StateTransferManagerImpl.java:152) [infinispan-core-5.0.0-SNAPSHOT.jar:5.0.0-SNAPSHOT]
      	... 12 more

      Now, what's odd about this is that the JGroupsDistSync.flushWaitGate behind it is only acquired/released while state transfer control command is sent and the logs show that both the enabling(acquiring) and disabling(releasing) state transfer control commands where built:

      15:39:20,902 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-4) 
      dests=[michal-linhard-37465], command=StateTransferControlCommand{enabled=true}, mode=SYNCHRONOUS, timeout=480000
      ...
      15:39:21,074 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-4) 
      dests=[michal-linhard-37465], command=StateTransferControlCommand{enabled=false}, mode=SYNCHRONOUS, timeout=480000

      There's no other references to StateTransferControlCommand, so how can it be that flushWaitGate is open?

              rh-ee-galder Galder Zamarreño
              rh-ee-galder Galder Zamarreño
              Archiver:
              rhn-support-adongare Amol Dongare

                Created:
                Updated:
                Resolved:
                Archived: