-
Bug
-
Resolution: Done
-
Major
-
5.0.0.FINAL
-
None
The coordinator of a cluster, which is the first node of the cluster can end up trying to fetch state from other nodes needlessly. For example:
1. A node starts up:
15:39:20,443 DEBUG [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (MSC service thread 1-2) New view accepted: [michal-linhard-12702|0] [michal-linhard-12702]
2. Before state transfer check happens, a new node joins:
15:39:20,735 DEBUG [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (pool-5-thread-29) New view accepted: [michal-linhard-12702|1] [michal-linhard-12702, michal-linhard-37465, michal-linhard-61619]
3. Now comes the coordinator which skips itself and sends a state trasnfer req to michal-linhard-37465:
15:39:20,902 INFO [org.infinispan.remoting.rpc.RpcManagerImpl] (MSC service thread 1-4) ISPN000074: Trying to fetch state from michal-linhard-37465
4. That's not right, cos 37465 is likely not gonna have anything in memory and this could potentially lead to deadlocks, where 37465 starts and request state from 12702, and in fact, that's what happens:
15:39:22,611 INFO [org.infinispan.remoting.rpc.RpcManagerImpl] (MSC service thread 1-4) ISPN000074: Trying to fetch state from michal-linhard-12702
5. In the mean time, as expected, 37465 writes nothing
15:39:22,710 DEBUG [org.infinispan.statetransfer.StateTransferManagerImpl] (STREAMING_STATE_TRANSFER-sender-1,default,michal-linhard-37465) Writing 0 StoredEntries to stream ... 15:39:22,806 TRACE [org.infinispan.transaction.TransactionLog] (STREAMING_STATE_TRANSFER-sender-1,default,michal-linhard-37465) Writing 0 pending prepares to the stream
In other words, in the current design, the coordinator should not go around asking for state.
- relates to
-
ISPN-1317 Concurrent state transfer requests can lead to premature flush wait closures
- Resolved