-
Bug
-
Resolution: Done
-
Major
-
9.4.15.Final, 10.0.0.Final
-
DataGrid Sprint #30
This scenario happens unintentionally in RebalancePolicyJmxTest, because the test waits for the default cache to finish rebalancing before killing the coordinator but doesn't care about the CONFIG cache:
- A and B are running, rebalancing is disabled, then C and D join
- Re-enable rebalance, but stop B and A before the rebalance is done
- C sees the finished rebalance, D sees the READ_OLD phase
- C becomes coordinator and should recover with C's topology, but instead has an assertion failure and doesn't install a stable topology
16:48:48,454 TRACE (stateTransferExecutor-thread-Test-NodeC-p49651-t6:[Merge-5]) [PreferAvailabilityStrategy] Cache org.infinispan.CONFIG keeping partition from [Test-NodeC-27509(rack-id=r2)]: CacheTopology{id=9, phase=NO_REBALANCE, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeC-27509(rack-id=r2): 125, Test-NodeD-62603(rack-id=r2): 131]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeC-27509(rack-id=r2), Test-NodeD-62603(rack-id=r2)], persistentUUIDs=[59d0898c-f166-4129-9165-a22aca475286, 05d7dd0b-7cd8-464d-8adb-41fac100e8bf]} 16:48:48,454 TRACE (stateTransferExecutor-thread-Test-NodeC-p49651-t6:[Merge-5]) [PreferAvailabilityStrategy] Cache org.infinispan.CONFIG keeping partition from [Test-NodeD-62603(rack-id=r2)]: CacheTopology{id=6, phase=READ_OLD_WRITE_ALL, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeA-4515(rack-id=r1): 127, Test-NodeB-42590(rack-id=r1): 129]}, pendingCH=ReplicatedConsistentHash{ns = 256, owners = (4)[Test-NodeA-4515(rack-id=r1): 63, Test-NodeB-42590(rack-id=r1): 62, Test-NodeC-27509(rack-id=r2): 64, Test-NodeD-62603(rack-id=r2): 67]}, unionCH=null, actualMembers=[Test-NodeA-4515(rack-id=r1), Test-NodeB-42590(rack-id=r1), Test-NodeC-27509(rack-id=r2), Test-NodeD-62603(rack-id=r2)], persistentUUIDs=[e9dcc3da-07a2-4159-a8b1-94e6428011c4, 3f27ddaa-1146-483e-8473-d79a5ba347f5, 59d0898c-f166-4129-9165-a22aca475286, 05d7dd0b-7cd8-464d-8adb-41fac100e8bf]} 16:48:48,454 TRACE (stateTransferExecutor-thread-Test-NodeC-p49651-t6:[Merge-5]) [PreferAvailabilityStrategy] Cache org.infinispan.CONFIG, resolveConflicts=false, newMembers=[Test-NodeC-27509(rack-id=r2), Test-NodeD-62603(rack-id=r2)], possibleOwners=[Test-NodeD-62603(rack-id=r2), Test-NodeC-27509(rack-id=r2)], preferredTopology=CacheTopology{id=6, phase=READ_OLD_WRITE_ALL, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeA-4515(rack-id=r1): 127, Test-NodeB-42590(rack-id=r1): 129]}, pendingCH=ReplicatedConsistentHash{ns = 256, owners = (4)[Test-NodeA-4515(rack-id=r1): 63, Test-NodeB-42590(rack-id=r1): 62, Test-NodeC-27509(rack-id=r2): 64, Test-NodeD-62603(rack-id=r2): 67]}, unionCH=null, actualMembers=[Test-NodeA-4515(rack-id=r1), Test-NodeB-42590(rack-id=r1), Test-NodeC-27509(rack-id=r2), Test-NodeD-62603(rack-id=r2)], persistentUUIDs=[e9dcc3da-07a2-4159-a8b1-94e6428011c4, 3f27ddaa-1146-483e-8473-d79a5ba347f5, 59d0898c-f166-4129-9165-a22aca475286, 05d7dd0b-7cd8-464d-8adb-41fac100e8bf]}, mergeTopologyId=10 16:48:48,454 WARN (stateTransferExecutor-thread-Test-NodeC-p49651-t6:[Merge-5]) [PreferAvailabilityStrategy] ISPN000517: Ignoring cache topology from [Test-NodeC-27509(rack-id=r2)] during merge: CacheTopology{id=9, phase=NO_REBALANCE, rebalanceId=3, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeC-27509(rack-id=r2): 125, Test-NodeD-62603(rack-id=r2): 131]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeC-27509(rack-id=r2), Test-NodeD-62603(rack-id=r2)], persistentUUIDs=[59d0898c-f166-4129-9165-a22aca475286, 05d7dd0b-7cd8-464d-8adb-41fac100e8bf]} 16:48:48,454 DEBUG (stateTransferExecutor-thread-Test-NodeC-p49651-t6:[Merge-5]) [CLUSTER] ISPN000521: Cache org.infinispan.CONFIG recovered after merge with topology = CacheTopology{id=10, phase=NO_REBALANCE, rebalanceId=4, currentCH=ReplicatedConsistentHash{ns = 256, owners = (2)[Test-NodeA-4515(rack-id=r1): 127, Test-NodeB-42590(rack-id=r1): 129]}, pendingCH=null, unionCH=null, actualMembers=[], persistentUUIDs=[]}, availability mode null 16:48:48,454 FATAL (stateTransferExecutor-thread-Test-NodeC-p49651-t6:[Merge-5]) [CLUSTER] [Context=org.infinispan.CONFIG] ISPN000313: Lost data because of abrupt leavers [Test-NodeA-4515(rack-id=r1), Test-NodeB-42590(rack-id=r1), Test-NodeC-27509(rack-id=r2), Test-NodeD-62603(rack-id=r2)] 16:48:48,455 ERROR (stateTransferExecutor-thread-Test-NodeC-p49651-t6:[Merge-5]) [LimitedExecutor] Exception in task java.lang.AssertionError: null at org.infinispan.partitionhandling.impl.PreferAvailabilityStrategy.onPartitionMerge(PreferAvailabilityStrategy.java:217) ~[classes/:?] at org.infinispan.topology.ClusterCacheStatus.doMergePartitions(ClusterCacheStatus.java:647) ~[classes/:?] at org.infinispan.topology.ClusterTopologyManagerImpl.lambda$recoverClusterStatus$4(ClusterTopologyManagerImpl.java:500) ~[classes/:?] at org.infinispan.executors.LimitedExecutor.runTasks(LimitedExecutor.java:175) [classes/:?]
Eventually the missing stable topology makes the test fail:
16:48:49,349 DEBUG (testng-Test:[null]) [ClusterCacheStatus] ISPN000519: Updating stable topology for cache org.infinispan.CONFIG, topology null 16:48:49,349 WARN (testng-Test:[null]) [CacheTopologyControlCommand] ISPN000071: Caught exception when handling command CacheTopologyControlCommand{cache=null, type=POLICY_ENABLE, sender=Test-NodeC-27509(rack-id=r2), joinInfo=null, topologyId=0, rebalanceId=0, currentCH=null, pendingCH=null, availabilityMode=null, phase=null, actualMembers=null, throwable=null, viewId=5} java.lang.NullPointerException: null at org.infinispan.topology.CacheTopologyControlCommand.<init>(CacheTopologyControlCommand.java:147) ~[classes/:?] at org.infinispan.topology.ClusterTopologyManagerImpl.broadcastStableTopologyUpdate(ClusterTopologyManagerImpl.java:659) ~[classes/:?] at org.infinispan.topology.ClusterCacheStatus.startQueuedRebalance(ClusterCacheStatus.java:806) ~[classes/:?] at java.util.concurrent.ConcurrentHashMap$ValuesView.forEach(ConcurrentHashMap.java:4772) ~[?:?] at org.infinispan.topology.ClusterTopologyManagerImpl.setRebalancingEnabled(ClusterTopologyManagerImpl.java:702) ~[classes/:?] at org.infinispan.topology.ClusterTopologyManagerImpl.setRebalancingEnabled(ClusterTopologyManagerImpl.java:682) ~[classes/:?] at org.infinispan.topology.CacheTopologyControlCommand.doPerform(CacheTopologyControlCommand.java:215) ~[classes/:?] at org.infinispan.topology.CacheTopologyControlCommand.invokeAsync(CacheTopologyControlCommand.java:163) [classes/:?] at org.infinispan.commands.ReplicableCommand.invoke(ReplicableCommand.java:44) [classes/:?] at org.infinispan.topology.LocalTopologyManagerImpl.executeOnClusterSync(LocalTopologyManagerImpl.java:752) [classes/:?] at org.infinispan.topology.LocalTopologyManagerImpl.setCacheRebalancingEnabled(LocalTopologyManagerImpl.java:623) [classes/:?] at org.infinispan.topology.LocalTopologyManagerImpl.setRebalancingEnabled(LocalTopologyManagerImpl.java:581) [classes/:?] 16:48:49,355 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.statetransfer.RebalancePolicyJmxTest.testJoinAndLeaveWithRebalanceSuspendedAwaitingInitialTransfer[DIST_SYNC] javax.management.MBeanException: Error invoking setter for attribute rebalancingEnabled at org.infinispan.jmx.ResourceDMBean.setNamedAttribute(ResourceDMBean.java:358) ~[classes/:?] at org.infinispan.jmx.ResourceDMBean.setAttribute(ResourceDMBean.java:216) ~[classes/:?] at com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.setAttribute(DefaultMBeanServerInterceptor.java:736) ~[?:?] at com.sun.jmx.mbeanserver.JmxMBeanServer.setAttribute(JmxMBeanServer.java:739) ~[?:?] at org.infinispan.statetransfer.RebalancePolicyJmxTest.doTest(RebalancePolicyJmxTest.java:163) ~[test-classes/:?] at org.infinispan.statetransfer.RebalancePolicyJmxTest.testJoinAndLeaveWithRebalanceSuspendedAwaitingInitialTransfer(RebalancePolicyJmxTest.java:44) ~[test-classes/:?] Caused by: java.lang.reflect.InvocationTargetException at jdk.internal.reflect.GeneratedMethodAccessor495.invoke(Unknown Source) ~[?:?] at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?] at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] at org.infinispan.jmx.ResourceDMBean$InvokableSetterBasedMBeanAttributeInfo.invoke(ResourceDMBean.java:422) ~[classes/:?] at org.infinispan.jmx.ResourceDMBean.setNamedAttribute(ResourceDMBean.java:355) ~[classes/:?] ... 28 more Caused by: org.infinispan.commons.CacheException: Unsuccessful local response at org.infinispan.topology.LocalTopologyManagerImpl.executeOnClusterSync(LocalTopologyManagerImpl.java:757) ~[classes/:?] at org.infinispan.topology.LocalTopologyManagerImpl.setCacheRebalancingEnabled(LocalTopologyManagerImpl.java:623) ~[classes/:?] at org.infinispan.topology.LocalTopologyManagerImpl.setRebalancingEnabled(LocalTopologyManagerImpl.java:581) ~[classes/:?] at jdk.internal.reflect.GeneratedMethodAccessor495.invoke(Unknown Source) ~[?:?] at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:?] at java.lang.reflect.Method.invoke(Method.java:566) ~[?:?] at org.infinispan.jmx.ResourceDMBean$InvokableSetterBasedMBeanAttributeInfo.invoke(ResourceDMBean.java:422) ~[classes/:?] at org.infinispan.jmx.ResourceDMBean.setNamedAttribute(ResourceDMBean.java:355) ~[classes/:?] ... 28 more
- relates to
-
ISPN-11605 Remove TopologyUpdateStableCommand
-
- Closed
-