-
Bug
-
Resolution: Unresolved
-
Major
-
13.0.2.Final
Cluster with 3 nodes: A, B, C, starting topology 10
NodeD joins, rebalance starts with topology 11
While NodeD is installing cluster listeners, B leaves
C confirms the rebalance with topology 11, which included B
A already has topology 12, without B, and rejects, C's confirmation
B then receives topology 12 but doesn't confirm it because it's supposed to be confirmed already
The issue causes random failures in ClusterListenerDistAddListenerTest.testMemberJoinsAndRetrievesClusterListenersButMainListenerNodeDiesBeforeInstalled:
10:14:12,996 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.notifications.cachelistener.cluster.ClusterListenerDistAddListenerTest.testMemberJoinsAndRetrievesClusterListenersButMainListenerNodeDiesBeforeInstalled org.infinispan.commons.CacheException: Initial state transfer timed out for cache cluster-listener on D at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:248) ~[classes/:?] at org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1017) ~[classes/:?] at org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:512) ~[classes/:?] at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:700) ~[classes/:?] at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:646) ~[classes/:?] at org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:535) ~[classes/:?] at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:513) ~[classes/:?] at org.infinispan.test.MultipleCacheManagersTest.cache(MultipleCacheManagersTest.java:530) ~[test-classes/:?] at org.infinispan.notifications.cachelistener.cluster.AbstractClusterListenerDistAddListenerTest.lambda$testMemberJoinsAndRetrievesClusterListenersButMainListenerNodeDiesBeforeInstalled$2(AbstractClusterListenerDistAddListenerTest.java:165) ~[test-classes/:?]
10:10:12,997 TRACE (non-blocking-thread-C-p30645-t4:[]) [JGroupsTransport] C sending command to A: ConfirmRebalancePhaseCommand{cacheName='cluster-listener', origin=C, throwable=null, topologyId=11, viewId=3} 10:10:12,997 DEBUG (non-blocking-thread-C-p30645-t4:[]) [LocalTopologyManagerImpl] Updating local topology for cache cluster-listener: CacheTopology{id=12, phase=READ_OLD_WRITE_ALL, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (2)[A: 132+34, C: 124+48]}, pendingCH=DefaultConsistentHash{ns=256, owners = (3)[A: 83+91, C: 87+85, D: 86+80]}, unionCH=null, actualMembers=[A, C, D], persistentUUIDs=[11c24a2d-77d1-448d-9ca0-1367b87fa9e8, 8012efb8-5aac-4092-b85f-0d49d12fecdf, 3db53eb6-67c7-48bd-9156-7d094fb2c0f2]} 10:10:12,997 TRACE (jgroups-7,A:[]) [JGroupsTransport] A received command from C: ConfirmRebalancePhaseCommand{cacheName='cluster-listener', origin=null, throwable=null, topologyId=11, viewId=3} 10:10:12,997 TRACE (jgroups-7,A:[]) [GlobalInboundInvocationHandler] Attempting to execute non-CacheRpcCommand: ConfirmRebalancePhaseCommand{cacheName='cluster-listener', origin=C, throwable=null, topologyId=11, viewId=3} [sender=C] 10:10:12,997 DEBUG (jgroups-7,A:[]) [ClusterCacheStatus] Ignoring rebalance confirmation from C for cache cluster-listener because the topology id is old (11, expected 12) 10:10:12,997 DEBUG (non-blocking-thread-C-p30645-t4:[]) [LocalTopologyManagerImpl] Starting local rebalance for cache cluster-listener, topology = CacheTopology{id=12, phase=READ_OLD_WRITE_ALL, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (2)[A: 132+34, C: 124+48]}, pendingCH=DefaultConsistentHash{ns=256, owners = (3)[A: 83+91, C: 87+85, D: 86+80]}, unionCH=null, actualMembers=[A, C, D], persistentUUIDs=[11c24a2d-77d1-448d-9ca0-1367b87fa9e8, 8012efb8-5aac-4092-b85f-0d49d12fecdf, 3db53eb6-67c7-48bd-9156-7d094fb2c0f2]} 10:10:12,998 TRACE (non-blocking-thread-C-p30645-t4:[]) [StateConsumerImpl] Received new topology for cache cluster-listener, isRebalance = true, isMember = true, topology = CacheTopology{id=12, phase=READ_OLD_WRITE_ALL, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (2)[A: 132+34, C: 124+48]}, pendingCH=DefaultConsistentHash{ns=256, owners = (3)[A: 83+91, C: 87+85, D: 86+80]}, unionCH=DefaultConsistentHash{ns=256, owners = (3)[A: 132+62, C: 124+79, D: 0+166]}, actualMembers=[A, C, D], persistentUUIDs=[11c24a2d-77d1-448d-9ca0-1367b87fa9e8, 8012efb8-5aac-4092-b85f-0d49d12fecdf, 3db53eb6-67c7-48bd-9156-7d094fb2c0f2]} 10:10:12,998 TRACE (non-blocking-thread-C-p30645-t4:[]) [StateConsumerImpl] On cache cluster-listener we have: new segments: {0-76 97-117 138-180 187-190 193-198 204-255}; old segments: {0-76 97-110 138-167 202 204-255} 10:10:12,998 TRACE (non-blocking-thread-C-p30645-t4:[]) [StateConsumerImpl] On cache cluster-listener we have: added segments: {111-117 168-180 187-190 193-198}; removed segments: {202} 10:10:12,998 TRACE (jgroups-8,C:[]) [StateConsumerImpl] Inbound transfer finished: InboundTransferTask{segments={145 202}, unfinishedSegments={}, source=B, isCancelled=true, completionFuture=java.util.concurrent.CompletableFuture@1529aa6a[Completed normally], topologyId=10, timeout=240000, cacheName=cluster-listener} 10:10:12,998 DEBUG (jgroups-8,C:[]) [StateConsumerImpl] Finished receiving of segments for cache cluster-listener for topology 10. 10:10:12,998 TRACE (non-blocking-thread-C-p30645-t4:[]) [StateConsumerImpl] No end of state transfer notification, waitingForState already set to false by another thread 10:10:12,998 TRACE (jgroups-8,C:[]) [StateConsumerImpl] No end of state transfer notification, waitingForState already set to false by another thread