Loading...

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XML

Word

Printable

Type: Task
Resolution: Obsolete
Priority: Major
Fix Version/s: None
Affects Version/s: 11.0.0.Dev03
Component/s: Core
Labels:
- testsuite_stability

Sprint:
DataGrid Sprint #42

Having a separate command for stable topologies means nodes may receive a topology update that should be stable without also receiving the confirmation that the topology is stable, complicating cluster recovery.

As an example, when the coordinator node shuts down, it first installs a new cache topology without itself for all of the running caches.
If the number of remaining owners is < numOwners, no rebalance is needed, and the old coordinator immediately sends a stable topology update as well.
But any (stable) topology updates from the old coordinator not yet processed by the time of the CacheStatusRequestCommand from the new coordinator will be ignored.

If the cluster had only 2 nodes, as in StatefulSetRollingUpgradeTest and StatefulSetRollingUpgradeIT, the stable topology update is vital to keep the cache in AVAILABLE mode, otherwise it goes into DEGRADED mode and no new nodes can join.

19:48:14,671 TRACE (jgroups-4,Test-NodeE:[]) [JGroupsTransport] Test-NodeE received command from Test-NodeD: ConsistentHashUpdateCommand{cacheName='testCache', origin=null, currentCH=DefaultConsistentHash{ns=256, owners = (1)[Test-NodeE: 256+0]}, pendingCH=null, phase=NO_REBALANCE, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342], availabilityMode=AVAILABLE, rebalanceId=5, topologyId=27, viewId=7}
19:48:14,672 TRACE (jgroups-5,Test-NodeE:[]) [JGroupsTransport] Test-NodeE received command from Test-NodeD: StableTopologyUpdateCommand{cacheName='testCache', origin=null, currentCH=DefaultConsistentHash{ns=256, owners = (1)[Test-NodeE: 256+0]}, pendingCH=null, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342], rebalanceId=5, topologyId=27, viewId=7}
19:48:14,675 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[]) [LocalTopologyManagerImpl] Acquired cache status testCache
19:48:14,675 DEBUG (non-blocking-thread-Test-NodeE-p9541-t5:[]) [LocalTopologyManagerImpl] Updating local topology for cache testCache: CacheTopology{id=27, phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (1)[Test-NodeE: 256+0]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342]}

19:48:14,686 INFO  (jgroups-5,Test-NodeE:[]) [CLUSTER] ISPN000094: Received new cluster view for channel org.infinispan.statetransfer.Test: [Test-NodeE|8] (1) [Test-NodeE]
19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[]) [LocalTopologyManagerImpl] Released cache status testCache
19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8]) [PreferConsistencyStrategy] Max stable partition topology: CacheTopology{id=25, phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (2)[Test-NodeD: 129+127, Test-NodeE: 127+129]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeD, Test-NodeE], persistentUUIDs=[099401f9-c0b6-460d-8dc8-5051699e8287, 72b17309-62d1-4928-abf6-88a8606ef342]}
19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8]) [PreferConsistencyStrategy] Max active partition topology: CacheTopology{id=27, phase=NO_REBALANCE, rebalanceId=5, currentCH=DefaultConsistentHash{ns=256, owners = (1)[Test-NodeE: 256+0]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeE], persistentUUIDs=[72b17309-62d1-4928-abf6-88a8606ef342]}

19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t3:[]) [LocalTopologyManagerImpl] Acquired cache status testCache
19:48:14,690 DEBUG (non-blocking-thread-Test-NodeE-p9541-t3:[]) [LocalTopologyManagerImpl] Ignoring topology 27 for cache testCache from old coordinator Test-NodeD
19:48:14,690 TRACE (non-blocking-thread-Test-NodeE-p9541-t3:[]) [LocalTopologyManagerImpl] Released cache status testCache

19:48:14,704 TRACE (non-blocking-thread-Test-NodeE-p9541-t5:[Merge-8]) [ClusterCacheStatus] Cache testCache availability changed: AVAILABLE -> DEGRADED_MODE
19:48:25,194 ERROR (testng-Test:[]) [TestSuiteProgress] Test failed: org.infinispan.statetransfer.StatefulSetRollingUpgradeTest.testStateTransferRestart[nodes=2]
org.infinispan.commons.CacheException: Initial state transfer timed out for cache testCache on Test-NodeF
	at org.infinispan.statetransfer.StateTransferManagerImpl.waitForInitialStateTransferToComplete(StateTransferManagerImpl.java:240) ~[classes/:?]
	at org.infinispan.cache.impl.CacheImpl.start(CacheImpl.java:1049) ~[classes/:?]
	at org.infinispan.cache.impl.AbstractDelegatingCache.start(AbstractDelegatingCache.java:513) ~[classes/:?]
	at org.infinispan.manager.DefaultCacheManager.wireAndStartCache(DefaultCacheManager.java:692) ~[classes/:?]
	at org.infinispan.manager.DefaultCacheManager.createCache(DefaultCacheManager.java:631) ~[classes/:?]
	at org.infinispan.manager.DefaultCacheManager.internalGetCache(DefaultCacheManager.java:516) ~[classes/:?]
	at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:497) ~[classes/:?]
	at org.infinispan.manager.DefaultCacheManager.getCache(DefaultCacheManager.java:490) ~[classes/:?]
	at org.infinispan.test.MultipleCacheManagersTest.getCaches(MultipleCacheManagersTest.java:322) ~[test-classes/:?]
	at org.infinispan.test.MultipleCacheManagersTest.waitForClusterToForm(MultipleCacheManagersTest.java:329) ~[test-classes/:?]
	at org.infinispan.statetransfer.StatefulSetRollingUpgradeTest.testStateTransferRestart(StatefulSetRollingUpgradeTest.java:78) ~[test-classes/:?]

is related to

ISPN-8240 Coordinator sends REBALANCE_START command when there is already a rebalance in progress

Closed

ISPN-6921 Partition does not become available after merge

Closed

ISPN-9270 ClusteredLockWith2NodesTest.testTryLockAndKillLocking random failures

Closed

ISPN-10365 PreferAvailabilityStrategy assertion failure

Closed

Assignee:: Unassigned

Reporter:: Dan Berindei (Inactive)

Archiver:: Amol Dongare

Created:: 2020/04/07 9:56 AM

Updated:: 2024/11/27 2:22 PM

Resolved:: 2024/11/27 2:22 PM

Archived:: 2024/11/28 6:21 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty