Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-9465

Initial state transfer timeout for joining node following previous kill if cache does not exist on the coordinator

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XMLWordPrintable

      Here's the scenario:
      1. Start cache manager on node1.
      2. Start cache manager on node2.
      3. Start clustered cache on node2.
      4. Kill node2's JVM
      5. Start cache manager on node2.
      6. Start clustered cache on node2.
      7. #6 fails due to state transfer timeout.

      To reproduce:
      1. Launch Coordinator
      2. Launch Test
      3. kill -9 JVM process for Test
      4. Relaunch Test
      5. Wait for state transfer timeout

      Seems to be due to node2 waiting for a state transfer that node1 never initiates.
      N.B. The issue is specific to the abrupt leave of node2. If node2 is stopped normally, the subsequent cache start succeeds.

      I have some TRACE logging from the test run that initially identified the issue. The relevant cache name is "client-mappings". Note that in this run there are several cache managers sharing the same channel via FORK, although that is not related to the issue (as the reproducer above uses simple JChannels):

      08:46:22,554 TRACE [org.infinispan.topology.ClusterCacheStatus] (remote-thread--p8-t1) Cache client-mappings initialized. Persisted state? false
      08:46:22,554 TRACE [org.infinispan.topology.ClusterCacheStatus] (remote-thread--p8-t1) Added joiner node-2 to cache client-mappings with persistent uuid a1aac693-eabb-4043-ba94-ca0eb0d3b7db: members = [node-2], joiners = [node-2]
      08:46:22,554 TRACE [org.infinispan.topology.ClusterCacheStatus] (remote-thread--p8-t1) Initializing status for cache client-mappings
      08:46:22,555 TRACE [org.infinispan.topology.ClusterCacheStatus] (remote-thread--p8-t1) Cache client-mappings topology updated: CacheTopology{id=1, phase=NO_REBALANCE, rebalanceId=1, currentCH=ReplicatedConsistentHash{ns = 256, owners = (1)[node-2: 256]}, pendingCH=null, unionCH=null, actualMembers=[node-2], persistentUUIDs=[a1aac693-eabb-4043-ba94-ca0eb0d3b7db]}, members = [node-2], joiners = []
      08:46:22,555 TRACE [org.infinispan.topology.CacheTopology] (remote-thread--p8-t1) Current consistent hash's routing table: node-2 primary: {0-255}
      08:46:22,555 TRACE [org.infinispan.topology.ClusterCacheStatus] (remote-thread--p8-t1) Cache client-mappings stable topology updated: members = [node-2], joiners = [], topology = CacheTopology{id=1, phase=NO_REBALANCE, rebalanceId=1, currentCH=ReplicatedConsistentHash{ns = 256, owners = (1)[node-2: 256]}, pendingCH=null, unionCH=null, actualMembers=[node-2], persistentUUIDs=[a1aac693-eabb-4043-ba94-ca0eb0d3b7db]}
      08:46:22,556 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (remote-thread--p8-t1) node-1 sending command to all: CacheTopologyControlCommand{cache=client-mappings, type=STABLE_TOPOLOGY_UPDATE, sender=node-1, joinInfo=null, topologyId=1, rebalanceId=1, currentCH=ReplicatedConsistentHash{ns = 256, owners = (1)[node-2: 256]}, pendingCH=null, availabilityMode=null, phase=NO_REBALANCE, actualMembers=[node-2], throwable=null, viewId=1}
      08:46:22,556 TRACE [org.infinispan.commons.marshall.MarshallableTypeHints] (remote-thread--p8-t1) Cache a buffer size predictor for 'org.infinispan.topology.CacheTopologyControlCommand' assuming its serializability is unknown
      08:46:22,556 TRACE [org.infinispan.commons.marshall.AdaptiveBufferSizePredictor] (remote-thread--p8-t1) Next predicted buffer size for object type 'org.infinispan.topology.CacheTopologyControlCommand' will be 512
      08:46:22,556 TRACE [org.jgroups.protocols.pbcast.NAKACK2] (remote-thread--p8-t1) node-1: sending node-1#11
      08:46:22,556 TRACE [org.jgroups.protocols.UDP] (remote-thread--p8-t1) node-1: sending msg to null, src=node-1, headers are FORK: ejb:ejb, NAKACK2: [MSG, seqno=11], TP: [cluster_name=ejb]
      08:46:22,556 TRACE [org.jgroups.protocols.MFC] (remote-thread--p8-t1) node-1 used 1156 credits, 1989619 remaining
      08:46:22,556 DEBUG [org.infinispan.topology.ClusterCacheStatus] (remote-thread--p8-t1) Queueing rebalance for cache client-mappings with members [node-2]
      08:46:22,556 TRACE [org.infinispan.topology.ClusterCacheStatus] (remote-thread--p8-t1) Rebalancing consistent hash for cache client-mappings, members are [node-2]
      08:46:22,558 TRACE [org.infinispan.topology.ClusterCacheStatus] (remote-thread--p8-t1) The balanced CH is the same as the current CH, not rebalancing
      08:46:22,558 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (remote-thread--p8-t1) node-1 sending response for request 2 to node-2: SuccessfulResponse(StatusResponse{cacheJoinInfo=null, cacheTopology=CacheTopology{id=1, phase=NO_REBALANCE, rebalanceId=1, currentCH=ReplicatedConsistentHash{ns = 256, owners = (1)[node-2: 256]}, pendingCH=null, unionCH=null, actualMembers=[node-2], persistentUUIDs=[a1aac693-eabb-4043-ba94-ca0eb0d3b7db]}, stableTopology=CacheTopology{id=1, phase=NO_REBALANCE, rebalanceId=1, currentCH=ReplicatedConsistentHash{ns = 256, owners = (1)[node-2: 256]}, pendingCH=null, unionCH=null, actualMembers=[node-2], persistentUUIDs=[a1aac693-eabb-4043-ba94-ca0eb0d3b7db]}})
      08:46:22,558 TRACE [org.infinispan.commons.marshall.AdaptiveBufferSizePredictor] (remote-thread--p8-t1) Next predicted buffer size for object type 'org.infinispan.remoting.responses.SuccessfulResponse' will be 512
      08:46:22,558 TRACE [org.jgroups.protocols.UNICAST3] (remote-thread--p8-t1) node-1 --> DATA(node-2: #13, conn_id=0)
      08:46:22,558 TRACE [org.jgroups.protocols.UDP] (remote-thread--p8-t1) node-1: sending msg to node-2, src=node-1, headers are RequestCorrelator: corr_id=0, type=RSP, req_id=2, rsp_expected=true, FORK: ejb:ejb, UNICAST3: DATA, seqno=13, TP: [cluster_name=ejb]
      08:46:22,558 TRACE [org.infinispan.topology.ClusterTopologyManagerImpl] (transport-thread--p16-t3) Attempting to execute command on self: CacheTopologyControlCommand{cache=client-mappings, type=STABLE_TOPOLOGY_UPDATE, sender=node-1, joinInfo=null, topologyId=1, rebalanceId=1, currentCH=ReplicatedConsistentHash{ns = 256, owners = (1)[node-2: 256]}, pendingCH=null, availabilityMode=null, phase=NO_REBALANCE, actualMembers=[node-2], throwable=null, viewId=1}
      08:46:22,559 TRACE [org.infinispan.factories.GlobalComponentRegistry] (transport-thread--p16-t3) Injecting dependency for field [org.infinispan.topology.CacheTopologyControlCommand.localTopologyManager] on an instance of [org.infinispan.topology.CacheTopologyControlCommand]
      08:46:22,559 TRACE [org.infinispan.factories.GlobalComponentRegistry] (transport-thread--p16-t3) Injecting dependency for field [org.infinispan.topology.CacheTopologyControlCommand.clusterTopologyManager] on an instance of [org.infinispan.topology.CacheTopologyControlCommand]
      08:46:22,564 TRACE [org.jgroups.protocols.UDP] (thread-13,ejb,node-1) node-1: received [dst: node-1, src: node-2 (4 headers), size=26 bytes, flags=OOB|NO_TOTAL_ORDER|NO_RELAY], headers are RequestCorrelator: corr_id=0, type=REQ, req_id=3, rsp_expected=true, FORK: ejb:ejb, UNICAST3: DATA, seqno=20, TP: [cluster_name=ejb]
      08:46:22,564 TRACE [org.jgroups.protocols.UNICAST3] (thread-13,ejb,node-1) node-1 <-- DATA(node-2: #20, conn_id=0)
      08:46:22,564 TRACE [org.jgroups.protocols.UNICAST3] (thread-13,ejb,node-1) node-1: delivering node-2#20
      08:46:22,564 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (thread-13,ejb,node-1) node-1 received request 3 from node-2: CacheTopologyControlCommand{cache=client-mappings, type=POLICY_GET_STATUS, sender=null, joinInfo=null, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, phase=null, actualMembers=null, throwable=null, viewId=0}
      08:46:22,564 TRACE [org.infinispan.remoting.inboundhandler.GlobalInboundInvocationHandler] (thread-13,ejb,node-1) Attempting to execute non-CacheRpcCommand: CacheTopologyControlCommand{cache=client-mappings, type=POLICY_GET_STATUS, sender=null, joinInfo=null, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, phase=null, actualMembers=null, throwable=null, viewId=0} [sender=node-2]
      08:46:22,564 TRACE [org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl] (thread-13,ejb,node-1) Added a new task directly: 0 task(s) are waiting
      08:46:22,564 TRACE [org.jgroups.protocols.UFC] (thread-13,ejb,node-1) node-2 used 26 credits, 1998681 remaining
      08:46:22,564 TRACE [org.infinispan.factories.GlobalComponentRegistry] (remote-thread--p8-t1) Injecting dependency for field [org.infinispan.topology.CacheTopologyControlCommand.localTopologyManager] on an instance of [org.infinispan.topology.CacheTopologyControlCommand]
      08:46:22,564 TRACE [org.infinispan.factories.GlobalComponentRegistry] (remote-thread--p8-t1) Injecting dependency for field [org.infinispan.topology.CacheTopologyControlCommand.clusterTopologyManager] on an instance of [org.infinispan.topology.CacheTopologyControlCommand]
      08:46:22,564 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (remote-thread--p8-t1) node-1 sending response for request 3 to node-2: SuccessfulResponse(true)
      08:46:22,564 TRACE [org.infinispan.commons.marshall.AdaptiveBufferSizePredictor] (remote-thread--p8-t1) Next predicted buffer size for object type 'org.infinispan.remoting.responses.SuccessfulResponse' will be 768
      08:46:22,564 TRACE [org.jgroups.protocols.UNICAST3] (remote-thread--p8-t1) node-1 --> DATA(node-2: #14, conn_id=0)
      08:46:22,564 TRACE [org.jgroups.protocols.UDP] (remote-thread--p8-t1) node-1: sending msg to node-2, src=node-1, headers are RequestCorrelator: corr_id=0, type=RSP, req_id=3, rsp_expected=true, FORK: ejb:ejb, UNICAST3: DATA, seqno=14, TP: [cluster_name=ejb]
      08:46:22,581 TRACE [org.jgroups.protocols.UNICAST3] (Timer runner-1,null,null) node-1 --> ACK(node-2: #20)
      08:46:22,581 TRACE [org.jgroups.protocols.UDP] (Timer runner-1,null,null) node-1: sending msg to node-2, src=node-1, headers are UNICAST3: ACK, seqno=20, ts=7, TP: [cluster_name=ejb]
      08:46:22,589 TRACE [org.jgroups.protocols.UDP] (thread-13,ejb,node-1) node-1: received [dst: node-1, src: node-2 (2 headers), size=0 bytes, flags=INTERNAL], headers are UNICAST3: ACK, seqno=14, ts=5, TP: [cluster_name=ejb]
      08:46:22,589 TRACE [org.jgroups.protocols.UNICAST3] (thread-13,ejb,node-1) node-1 <-- ACK(node-2: #14, conn-id=0, ts=5)
      08:46:22,593 TRACE [org.jgroups.protocols.UDP] (thread-13,ejb,node-1) node-1: received [dst: <null>, src: node-2 (3 headers), size=64 bytes, flags=OOB|NO_TOTAL_ORDER|NO_RELAY], headers are FORK: ejb:ejb, NAKACK2: [MSG, seqno=4], TP: [cluster_name=ejb]
      08:46:22,593 TRACE [org.jgroups.protocols.pbcast.NAKACK2] (thread-13,ejb,node-1) node-1: received node-2#4
      08:46:22,593 TRACE [org.jgroups.protocols.pbcast.NAKACK2] (thread-13,ejb,node-1) node-1: delivering node-2#4
      08:46:22,593 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (thread-13,ejb,node-1) node-1 received command from node-2: TxCompletionNotificationCommand{ xid=null, internalId=0, topologyId=1, gtx=GlobalTx:node-2:3, cacheName=client-mappings} 
      08:46:22,593 TRACE [org.infinispan.remoting.inboundhandler.GlobalInboundInvocationHandler] (thread-13,ejb,node-1) Attempting to execute CacheRpcCommand: TxCompletionNotificationCommand{ xid=null, internalId=0, topologyId=1, gtx=GlobalTx:node-2:3, cacheName=client-mappings}  [sender=node-2]
      08:46:22,593 TRACE [org.infinispan.remoting.inboundhandler.GlobalInboundInvocationHandler] (thread-13,ejb,node-1) Silently ignoring that client-mappings cache is not defined
      08:46:22,593 TRACE [org.jgroups.protocols.MFC] (thread-13,ejb,node-1) node-2 used 64 credits, 1999587 remaining
      08:46:22,682 TRACE [org.jgroups.protocols.UDP] (Timer runner-1,null,null) node-1: sending msg to null, src=node-1, headers are NAKACK2: [HIGHEST_SEQNO, seqno=11], TP: [cluster_name=ejb]
      

      TRACE logging from node-2:

      08:46:22,552 DEBUG [org.infinispan.topology.LocalTopologyManagerImpl] (ServerService Thread Pool -- 84) Node node-2 joining cache client-mappings
      08:46:22,552 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (ServerService Thread Pool -- 84) node-2 sending request 2 to node-1: CacheTopologyControlCommand{cache=client-mappings, type=JOIN, sender=node-2, joinInfo=CacheJoinInfo{consistentHashFactory=org.infinispan.distribution.ch.impl.SyncReplicatedConsistentHashFactory@43db48bd, hashFunction=MurmurHash3, numSegments=256, numOwners=2, timeout=240000, totalOrder=false, cacheMode=REPL_SYNC, persistentUUID=a1aac693-eabb-4043-ba94-ca0eb0d3b7db, persistentStateChecksum=Optional.empty}, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, phase=null, actualMembers=null, throwable=null, viewId=1}
      08:46:22,553 TRACE [org.infinispan.commons.marshall.AdaptiveBufferSizePredictor] (ServerService Thread Pool -- 84) Next predicted buffer size for object type 'org.infinispan.topology.CacheTopologyControlCommand' will be 512
      08:46:22,553 TRACE [org.jgroups.protocols.UNICAST3] (ServerService Thread Pool -- 84) node-2 --> DATA(node-1: #19, conn_id=0)
      08:46:22,553 TRACE [org.jgroups.protocols.UDP] (ServerService Thread Pool -- 84) node-2: sending msg to node-1, src=node-2, headers are RequestCorrelator: corr_id=0, type=REQ, req_id=2, rsp_expected=true, FORK: ejb:ejb, UNICAST3: DATA, seqno=19, TP: [cluster_name=ejb]
      08:46:22,553 TRACE [org.jgroups.protocols.UFC] (ServerService Thread Pool -- 84) node-2 used 99 credits, 1998711 remaining
      08:46:22,556 TRACE [org.jgroups.protocols.UDP] (thread-13,ejb,node-2) node-2: received [dst: <null>, src: node-1 (3 headers), size=1156 bytes, flags=OOB|NO_TOTAL_ORDER|NO_RELAY], headers are FORK: ejb:ejb, NAKACK2: [MSG, seqno=11], TP: [cluster_name=ejb]
      08:46:22,556 TRACE [org.jgroups.protocols.pbcast.NAKACK2] (thread-13,ejb,node-2) node-2: received node-1#11
      08:46:22,556 TRACE [org.jgroups.protocols.pbcast.NAKACK2] (thread-13,ejb,node-2) node-2: delivering node-1#11
      08:46:22,556 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (thread-13,ejb,node-2) node-2 received command from node-1: CacheTopologyControlCommand{cache=client-mappings, type=STABLE_TOPOLOGY_UPDATE, sender=node-1, joinInfo=null, topologyId=1, rebalanceId=1, currentCH=ReplicatedConsistentHash{ns = 256, owners = (1)[node-2: 256]}, pendingCH=null, availabilityMode=null, phase=null, actualMembers=[node-2], throwable=null, viewId=1}
      08:46:22,556 TRACE [org.infinispan.remoting.inboundhandler.GlobalInboundInvocationHandler] (thread-13,ejb,node-2) Attempting to execute non-CacheRpcCommand: CacheTopologyControlCommand{cache=client-mappings, type=STABLE_TOPOLOGY_UPDATE, sender=node-1, joinInfo=null, topologyId=1, rebalanceId=1, currentCH=ReplicatedConsistentHash{ns = 256, owners = (1)[node-2: 256]}, pendingCH=null, availabilityMode=null, phase=null, actualMembers=[node-2], throwable=null, viewId=1} [sender=node-1]
      08:46:22,557 TRACE [org.infinispan.util.concurrent.BlockingTaskAwareExecutorServiceImpl] (thread-13,ejb,node-2) Added a new task directly: 0 task(s) are waiting
      08:46:22,557 TRACE [org.jgroups.protocols.MFC] (thread-13,ejb,node-2) node-1 used 1156 credits, 1989619 remaining
      08:46:22,557 TRACE [org.infinispan.factories.GlobalComponentRegistry] (remote-thread--p5-t1) Injecting dependency for field [org.infinispan.topology.CacheTopologyControlCommand.localTopologyManager] on an instance of [org.infinispan.topology.CacheTopologyControlCommand]
      08:46:22,557 TRACE [org.infinispan.factories.GlobalComponentRegistry] (remote-thread--p5-t1) Injecting dependency for field [org.infinispan.topology.CacheTopologyControlCommand.clusterTopologyManager] on an instance of [org.infinispan.topology.CacheTopologyControlCommand]
      08:46:22,559 TRACE [org.jgroups.protocols.UDP] (thread-13,ejb,node-2) node-2: received [dst: node-2, src: node-1 (4 headers), size=2210 bytes, flags=OOB|NO_FC|NO_TOTAL_ORDER], headers are RequestCorrelator: corr_id=0, type=RSP, req_id=2, rsp_expected=true, FORK: ejb:ejb, UNICAST3: DATA, seqno=13, TP: [cluster_name=ejb]
      08:46:22,559 TRACE [org.jgroups.protocols.UNICAST3] (thread-13,ejb,node-2) node-2 <-- DATA(node-1: #13, conn_id=0)
      08:46:22,559 TRACE [org.jgroups.protocols.UNICAST3] (thread-13,ejb,node-2) node-2: delivering node-1#13
      08:46:22,559 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (thread-13,ejb,node-2) node-2 received response for request 2 from node-1: SuccessfulResponse(StatusResponse{cacheJoinInfo=null, cacheTopology=CacheTopology{id=1, phase=NO_REBALANCE, rebalanceId=1, currentCH=ReplicatedConsistentHash{ns = 256, owners = (1)[node-2: 256]}, pendingCH=null, unionCH=null, actualMembers=[node-2], persistentUUIDs=[a1aac693-eabb-4043-ba94-ca0eb0d3b7db]}, stableTopology=CacheTopology{id=1, phase=NO_REBALANCE, rebalanceId=1, currentCH=ReplicatedConsistentHash{ns = 256, owners = (1)[node-2: 256]}, pendingCH=null, unionCH=null, actualMembers=[node-2], persistentUUIDs=[a1aac693-eabb-4043-ba94-ca0eb0d3b7db]}})
      08:46:22,559 DEBUG [org.infinispan.topology.LocalTopologyManagerImpl] (ServerService Thread Pool -- 84) Updating local topology for cache client-mappings: CacheTopology{id=1, phase=NO_REBALANCE, rebalanceId=1, currentCH=ReplicatedConsistentHash{ns = 256, owners = (1)[node-2: 256]}, pendingCH=null, unionCH=null, actualMembers=[node-2], persistentUUIDs=[a1aac693-eabb-4043-ba94-ca0eb0d3b7db]}
      08:46:22,559 TRACE [org.infinispan.topology.CacheTopology] (ServerService Thread Pool -- 84) Current consistent hash's routing table: node-2 primary: {0-255}
      08:46:22,559 TRACE [org.infinispan.statetransfer.StateTransferManagerImpl] (ServerService Thread Pool -- 84) Installing new cache topology CacheTopology{id=1, phase=NO_REBALANCE, rebalanceId=1, currentCH=ReplicatedConsistentHash{ns = 256, owners = (1)[node-2: 256]}, pendingCH=null, unionCH=null, actualMembers=[node-2], persistentUUIDs=[a1aac693-eabb-4043-ba94-ca0eb0d3b7db]} on cache client-mappings
      08:46:22,560 TRACE [org.infinispan.statetransfer.StateTransferManagerImpl] (ServerService Thread Pool -- 84) This is the first topology 1 in which the local node is a member
      08:46:22,560 TRACE [org.infinispan.notifications.cachelistener.CacheNotifierImpl] (ServerService Thread Pool -- 84) Invoking listener: org.infinispan.transaction.xa.XaTransactionTable@4bdae993 passing event EventImpl{type=TOPOLOGY_CHANGED, pre=true, cache=Cache 'client-mappings'@node-2, readConsistentHashAtStart=ReplicatedConsistentHash{ns = 256, owners = (1)[node-2: 256]}, writeConsistentHashAtStart=ReplicatedConsistentHash{ns = 256, owners = (1)[node-2: 256]}, readConsistentHashAtEnd=PartitionerConsistentHash:ReplicatedConsistentHash{ns = 256, owners = (1)[node-2: 256]}, writeConsistentHashAtEnd=PartitionerConsistentHash:ReplicatedConsistentHash{ns = 256, owners = (1)[node-2: 256]}, unionConsistentHash=null, newTopologyId=1}
      08:46:22,560 TRACE [org.infinispan.statetransfer.StateConsumerImpl] (ServerService Thread Pool -- 84) Received new topology for cache client-mappings, isRebalance = false, isMember = true, topology = CacheTopology{id=1, phase=NO_REBALANCE, rebalanceId=1, currentCH=PartitionerConsistentHash:ReplicatedConsistentHash{ns = 256, owners = (1)[node-2: 256]}, pendingCH=null, unionCH=null, actualMembers=[node-2], persistentUUIDs=[a1aac693-eabb-4043-ba94-ca0eb0d3b7db]}
      08:46:22,560 TRACE [org.infinispan.distribution.impl.DistributionManagerImpl] (ServerService Thread Pool -- 84) Topology updated to CacheTopology{id=1, phase=NO_REBALANCE, rebalanceId=1, currentCH=PartitionerConsistentHash:ReplicatedConsistentHash{ns = 256, owners = (1)[node-2: 256]}, pendingCH=null, unionCH=null, actualMembers=[node-2], persistentUUIDs=[a1aac693-eabb-4043-ba94-ca0eb0d3b7db]}
      08:46:22,560 TRACE [org.infinispan.container.impl.DefaultSegmentedDataContainer] (ServerService Thread Pool -- 84) Ensuring segments RangeSet(256) are started
      08:46:22,560 TRACE [org.infinispan.statetransfer.StateTransferLockImpl] (ServerService Thread Pool -- 84) Signalling topology 1 is installed
      08:46:22,560 TRACE [org.infinispan.statetransfer.StateConsumerImpl] (ServerService Thread Pool -- 84) On cache client-mappings we have: added segments: {}
      08:46:22,560 TRACE [org.infinispan.statetransfer.StateConsumerImpl] (ServerService Thread Pool -- 84) Topology update processed, stateTransferTopologyId = -1, startRebalance = false, pending CH = null
      08:46:22,560 TRACE [org.infinispan.statetransfer.StateTransferLockImpl] (ServerService Thread Pool -- 84) Signalling transaction data received for topology 1
      08:46:22,560 TRACE [org.infinispan.statetransfer.StateConsumerImpl] (ServerService Thread Pool -- 84) No end of state transfer notification, waitingForState already set to false by another thread
      08:46:22,560 TRACE [org.infinispan.transaction.impl.TransactionTable] (ServerService Thread Pool -- 84) Checking for transactions originated on leavers. Current cache members are [node-1, node-2], remote transactions: 0
      08:46:22,560 TRACE [org.infinispan.transaction.impl.TransactionTable] (ServerService Thread Pool -- 84) No remote transactions pertain to originator(s) who have left the cluster.
      08:46:22,561 DEBUG [org.infinispan.statetransfer.StateConsumerImpl] (ServerService Thread Pool -- 84) Removing no longer owned entries for cache client-mappings
      08:46:22,561 TRACE [org.infinispan.container.impl.DefaultSegmentedDataContainer] (ServerService Thread Pool -- 84) Removing segments: {} from container
      08:46:22,561 TRACE [org.infinispan.notifications.cachelistener.CacheNotifierImpl] (ServerService Thread Pool -- 84) Invoking listener: org.infinispan.transaction.xa.XaTransactionTable@4bdae993 passing event EventImpl{type=TOPOLOGY_CHANGED, pre=false, cache=Cache 'client-mappings'@node-2, readConsistentHashAtStart=ReplicatedConsistentHash{ns = 256, owners = (1)[node-2: 256]}, writeConsistentHashAtStart=ReplicatedConsistentHash{ns = 256, owners = (1)[node-2: 256]}, readConsistentHashAtEnd=PartitionerConsistentHash:ReplicatedConsistentHash{ns = 256, owners = (1)[node-2: 256]}, writeConsistentHashAtEnd=PartitionerConsistentHash:ReplicatedConsistentHash{ns = 256, owners = (1)[node-2: 256]}, unionConsistentHash=null, newTopologyId=1}
      08:46:22,561 DEBUG [org.infinispan.transaction.impl.TransactionTable] (ServerService Thread Pool -- 84) Topology changed, recalculating minTopologyId
      08:46:22,561 TRACE [org.infinispan.transaction.impl.TransactionTable] (ServerService Thread Pool -- 84) Changing minimum topology ID from -1 to 1
      08:46:22,561 TRACE [org.infinispan.statetransfer.StateTransferManagerImpl] (ServerService Thread Pool -- 84) Initial state transfer complete for cache client-mappings on node node-2
      08:46:22,561 TRACE [org.infinispan.topology.LocalTopologyManagerImpl] (ServerService Thread Pool -- 84) Updating stable topology for cache client-mappings: CacheTopology{id=1, phase=NO_REBALANCE, rebalanceId=1, currentCH=ReplicatedConsistentHash{ns = 256, owners = (1)[node-2: 256]}, pendingCH=null, unionCH=null, actualMembers=[node-2], persistentUUIDs=[a1aac693-eabb-4043-ba94-ca0eb0d3b7db]}
      08:46:22,561 TRACE [org.infinispan.statetransfer.StateTransferManagerImpl] (ServerService Thread Pool -- 84) StateTransferManager of cache client-mappings on node node-2 received initial topology CacheTopology{id=1, phase=NO_REBALANCE, rebalanceId=1, currentCH=ReplicatedConsistentHash{ns = 256, owners = (1)[node-2: 256]}, pendingCH=null, unionCH=null, actualMembers=[node-2], persistentUUIDs=[a1aac693-eabb-4043-ba94-ca0eb0d3b7db]}
      08:46:22,563 TRACE [org.infinispan.factories.ComponentRegistry] (ServerService Thread Pool -- 84) Invoking start method waitForInitialStateTransferToComplete(priority=1000) on component org.infinispan.statetransfer.StateTransferManager
      08:46:22,563 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (ServerService Thread Pool -- 84) node-2 sending request 3 to node-1: CacheTopologyControlCommand{cache=client-mappings, type=POLICY_GET_STATUS, sender=node-2, joinInfo=null, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, phase=null, actualMembers=null, throwable=null, viewId=1}
      08:46:22,563 TRACE [org.infinispan.commons.marshall.AdaptiveBufferSizePredictor] (ServerService Thread Pool -- 84) Next predicted buffer size for object type 'org.infinispan.topology.CacheTopologyControlCommand' will be 480
      08:46:22,563 TRACE [org.jgroups.protocols.UNICAST3] (ServerService Thread Pool -- 84) node-2 --> DATA(node-1: #20, conn_id=0)
      08:46:22,563 TRACE [org.jgroups.protocols.UDP] (ServerService Thread Pool -- 84) node-2: sending msg to node-1, src=node-2, headers are RequestCorrelator: corr_id=0, type=REQ, req_id=3, rsp_expected=true, FORK: ejb:ejb, UNICAST3: DATA, seqno=20, TP: [cluster_name=ejb]
      08:46:22,564 TRACE [org.jgroups.protocols.UFC] (ServerService Thread Pool -- 84) node-2 used 26 credits, 1998685 remaining
      08:46:22,565 TRACE [org.jgroups.protocols.UDP] (thread-13,ejb,node-2) node-2: received [dst: node-2, src: node-1 (4 headers), size=5 bytes, flags=OOB|NO_FC|NO_TOTAL_ORDER], headers are RequestCorrelator: corr_id=0, type=RSP, req_id=3, rsp_expected=true, FORK: ejb:ejb, UNICAST3: DATA, seqno=14, TP: [cluster_name=ejb]
      08:46:22,565 TRACE [org.jgroups.protocols.UNICAST3] (thread-13,ejb,node-2) node-2 <-- DATA(node-1: #14, conn_id=0)
      08:46:22,565 TRACE [org.jgroups.protocols.UNICAST3] (thread-13,ejb,node-2) node-2: delivering node-1#14
      08:46:22,565 TRACE [org.infinispan.remoting.transport.jgroups.JGroupsTransport] (thread-13,ejb,node-2) node-2 received response for request 3 from node-1: SuccessfulResponse(true)
      08:46:22,567 TRACE [org.infinispan.statetransfer.StateTransferManagerImpl] (ServerService Thread Pool -- 84) Waiting for initial state transfer to finish for cache client-mappings on node-2
      08:46:22,567 TRACE [org.infinispan.notifications.cachemanagerlistener.CacheManagerNotifierImpl] (ServerService Thread Pool -- 84) Invoking listener: org.jboss.as.clustering.infinispan.subsystem.CacheContainerServiceConfigurator@1aef5a9b passing event EventImpl{type=CACHE_STARTED, newMembers=null, oldMembers=null, localAddress=null, viewId=0, subgroupsMerged=null, mergeView=false}
      08:46:22,567 INFO  [org.jboss.as.clustering.infinispan] (ServerService Thread Pool -- 84) WFLYCLINF0002: Started client-mappings cache from ejb container
      08:46:22,568 DEBUG [org.infinispan.cache.impl.CacheImpl] (ServerService Thread Pool -- 84) Started cache client-mappings on node-2
      08:46:22,568 TRACE [org.infinispan.manager.DefaultCacheManager] (ServerService Thread Pool -- 84) Cache client-mappings started
      08:46:22,568 DEBUG [org.infinispan.cache.impl.CacheImpl] (ServerService Thread Pool -- 84) Started cache client-mappings on node-2
      

      After 4 minutes, node2 eventually throws:

      08:51:00,892 ERROR [org.jboss.msc.service.fail] (ServerService Thread Pool -- 108) MSC000001: Failed to start service org.wildfly.clustering.infinispan.cache.ejb.client-mappings: org.jboss.msc.service.StartException in service org.wildfly.clustering.infinispan.cache.ejb.client-mappings: org.infinispan.commons.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.Exception on object of type StateTransferManagerImpl
      	at org.wildfly.clustering.service.FunctionalService.start(FunctionalService.java:70) [wildfly-clustering-service-14.0.0.CR1-SNAPSHOT.jar:14.0.0.CR1-SNAPSHOT]
      	at org.wildfly.clustering.service.AsyncServiceConfigurator$AsyncService.lambda$start$0(AsyncServiceConfigurator.java:117) [wildfly-clustering-service-14.0.0.CR1-SNAPSHOT.jar:14.0.0.CR1-SNAPSHOT]
      	at org.jboss.threads.ContextClassLoaderSavingRunnable.run(ContextClassLoaderSavingRunnable.java:35) [jboss-threads-2.3.2.Final.jar:2.3.2.Final]
      	at org.jboss.threads.EnhancedQueueExecutor.safeRun(EnhancedQueueExecutor.java:1985) [jboss-threads-2.3.2.Final.jar:2.3.2.Final]
      	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.doRunTask(EnhancedQueueExecutor.java:1487) [jboss-threads-2.3.2.Final.jar:2.3.2.Final]
      	at org.jboss.threads.EnhancedQueueExecutor$ThreadBody.run(EnhancedQueueExecutor.java:1378) [jboss-threads-2.3.2.Final.jar:2.3.2.Final]
      	at java.lang.Thread.run(Thread.java:748) [rt.jar:1.8.0_181]
      	at org.jboss.threads.JBossThread.run(JBossThread.java:485) [jboss-threads-2.3.2.Final.jar:2.3.2.Final]
      Caused by: org.infinispan.commons.CacheException: Unable to invoke method public void org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete() throws java.lang.Exception on object of type StateTransferManagerImpl
      	at org.infinispan.commons.util.SecurityActions.lambda$invokeAccessibly$0(SecurityActions.java:83)
      	at org.infinispan.commons.util.SecurityActions.doPrivileged(SecurityActions.java:71)
      	at org.infinispan.commons.util.SecurityActions.invokeAccessibly(SecurityActions.java:76)
      	at org.infinispan.commons.util.ReflectionUtil.invokeAccessibly(ReflectionUtil.java:185)
      	at org.infinispan.factories.AbstractComponentRegistry$PrioritizedMethod.invoke(AbstractComponentRegistry.java:968)
      	at org.infinispan.factories.AbstractComponentRegistry.lambda$invokePrioritizedMethods$6(AbstractComponentRegistry.java:703)
      	at org.infinispan.factories.SecurityActions.lambda$run$1(SecurityActions.java:72)
      	at org.infinispan.security.Security.doPrivileged(Security.java:44)
      	at org.infinispan.factories.SecurityActions.run(SecurityActions.java:71)
      	at org.infinispan.factories.AbstractComponentRegistry.invokePrioritizedMethods(AbstractComponentRegistry.java:696)
      	at org.infinispan.factories.AbstractComponentRegistry.internalStart(AbstractComponentRegistry.java:689)
      	at org.infinispan.factories.AbstractComponentRegistry.start(AbstractComponentRegistry.java:607)
      	at org.infinispan.factories.ComponentRegistry.start(ComponentRegistry.java:244)
      	at org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1051)
      	at org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:421)
      	at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:646)
      	at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:591)
      	at org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:477)
      	at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:463)
      	at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:449)
      	at org.jboss.as.clustering.infinispan.DefaultCacheContainer.getCache(DefaultCacheContainer.java:86)
      	at org.wildfly.clustering.infinispan.spi.service.CacheServiceConfigurator.get(CacheServiceConfigurator.java:77)
      	at org.wildfly.clustering.infinispan.spi.service.CacheServiceConfigurator.get(CacheServiceConfigurator.java:55)
      	at org.wildfly.clustering.service.FunctionalService.start(FunctionalService.java:67) [wildfly-clustering-service-14.0.0.CR1-SNAPSHOT.jar:14.0.0.CR1-SNAPSHOT]
      	... 7 more
      Caused by: org.infinispan.commons.CacheException: Initial state transfer timed out for cache client-mappings on node-2
      	at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:233)
      	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) [rt.jar:1.8.0_181]
      	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) [rt.jar:1.8.0_181]
      	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) [rt.jar:1.8.0_181]
      	at java.lang.reflect.Method.invoke(Method.java:498) [rt.jar:1.8.0_181]
      	at org.infinispan.commons.util.SecurityActions.lambda$invokeAccessibly$0(SecurityActions.java:79)
      	... 30 more
      

        1. Coordinator.java
          1 kB
        2. Coordinator-TRACE.log
          83 kB
        3. Test.java
          1 kB
        4. Test-TRACE.log
          27 kB

              dberinde@redhat.com Dan Berindei (Inactive)
              pferraro@redhat.com Paul Ferraro
              Archiver:
              rhn-support-adongare Amol Dongare

                Created:
                Updated:
                Resolved:
                Archived: