[ISPN-12350] Persistent UUIDs are only used for initial consistent hash

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

Type: Bug
Resolution: Obsolete
Priority: Major
Fix Version/s: None
Affects Version/s: 11.0.3.Final, 12.0.0.Final
Component/s: Core, State Transfer
Labels:
None

Release Note Text:
Undefined
Git Pull Request:
https://github.com/infinispan/infinispan/pull/10167

After a graceful restart, the persisted UUIDs are used to re-create the consistent hash of the cache before shutdown. This initial CH will not be rebalanced, so there is no state transfer immediately after cluster restart.

However, if something then triggers a rebalance (e.g. a node join/leave), the persistent UUIDs are ignored, and SyncConsistentHashFactory allocates segments based on the new JGroups addresses instead of the persistent UUIDs.

I modified ThreeNodeDistGlobalStateRestartTest to force a rebalance after restart, and I got

11:24:07,424 TRACE (jgroups-7,Test-NodeD:[]) [ClusterCacheStatus] Cache testCache topology updated: CacheTopology{id=1, phase=NO_REBALANCE, rebalanceId=1, currentCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeD: 83+0, Test-NodeE: 87+0, Test-NodeF: 86+0]}, pendingCH=null, unionCH=null, actualMembers=[Test-NodeD, Test-NodeE, Test-NodeF], persistentUUIDs=[1ba71c04-a6b9-4a5c-9f51-e5e358081dc6, 6d3ff549-aafa-4d8a-8617-84ac6f119549, f37f6a8c-32a4-4dda-b1b0-876c24f42c6a]}, members = [Test-NodeD, Test-NodeE, Test-NodeF], joiners = []
11:24:07,889 TRACE (testng-Test:[]) [ClusterCacheStatus] Rebalancing consistent hash for cache testCache, members are [Test-NodeD, Test-NodeE, Test-NodeF]
11:24:07,909 TRACE (testng-Test:[]) [ClusterCacheStatus] Updating cache testCache topology for rebalance: CacheTopology{id=2, phase=READ_OLD_WRITE_ALL, rebalanceId=2, currentCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeD: 83+0, Test-NodeE: 87+0, Test-NodeF: 86+0]}, pendingCH=DefaultConsistentHash{ns=256, owners = (3)[Test-NodeD: 87+0, Test-NodeE: 83+0, Test-NodeF: 86+0]}, unionCH=null, actualMembers=[Test-NodeD, Test-NodeE, Test-NodeF], persistentUUIDs=[1ba71c04-a6b9-4a5c-9f51-e5e358081dc6, 6d3ff549-aafa-4d8a-8617-84ac6f119549, f37f6a8c-32a4-4dda-b1b0-876c24f42c6a]}
11:24:07,910 TRACE (testng-Test:[]) [ClusterCacheStatus] Moved segments: [Test-NodeD added 72 removed 68, Test-NodeE added 49 removed 53, Test-NodeF added 59 removed 59]

This issue does not affect caches using DefaultConsistentHashFactory, because it doesn't care about member UUIDs. Since there is no SyncScatteredConsistentHashFactory, scattered cache are not affected at all. Replicated caches with the default SyncReplicateedConsistentHashFactory will change primary owners, but they won't need any state transfer.

TestingUtil.waitForNoRebalance() works around the issue by not checking whether the initial consistent hash (with topologyId==1) is balanced.

is cloned by

ISPN-13996 [DOCS] Persistent UUIDs are only used for initial consistent hash

Closed

JDG-4995 Persistent UUIDs are only used for initial consistent hash

Closed

is related to

ISPN-12221 Add zero-capacity-node support for Replicated caches

Closed

relates to

ISPN-13996 [DOCS] Persistent UUIDs are only used for initial consistent hash

Closed

Tristan Tarrant added a comment - 2024/11/27 2:38 PM

Infinispan issue tracking has been migrated to GitHub issues: https://github.com/infinispan/infinispan/issues
If you still want this issue to be worked on, create a new issue on GitHub and link this issue.

Tristan Tarrant added a comment - 2024/11/27 2:38 PM Infinispan issue tracking has been migrated to GitHub issues: https://github.com/infinispan/infinispan/issues If you still want this issue to be worked on, create a new issue on GitHub and link this issue.

Dan Berindei (Inactive) added a comment - 2021/11/22 3:37 PM

Another case where the persistent UUID is not used currently but would be very helpful is when a node is restarted.

If the cache enters degraded mode after the node stopping (e.g. because the coordinator didn't get the leave request, or because the leaver was the coordinator), the node will start with a new UUID, and the cache never goes back to available mode.

When this is implemented, we could also have a workflow that disables rebalancing, stops a node, does something with it, starts the node, then enables rebalancing, and it receives exactly the same segments it had before.

Dan Berindei (Inactive) added a comment - 2021/11/22 3:37 PM Another case where the persistent UUID is not used currently but would be very helpful is when a node is restarted. If the cache enters degraded mode after the node stopping (e.g. because the coordinator didn't get the leave request, or because the leaver was the coordinator), the node will start with a new UUID, and the cache never goes back to available mode. When this is implemented, we could also have a workflow that disables rebalancing, stops a node, does something with it, starts the node, then enables rebalancing, and it receives exactly the same segments it had before.

Assignee:: Jose Bolina

Reporter:: Dan Berindei (Inactive)

Archiver:: Amol Dongare

Created:: 2020/09/24 4:36 AM

Updated:: 2024/11/27 2:38 PM

Resolved:: 2024/11/27 12:39 PM

Archived:: 2024/11/28 6:21 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

Collapse comment: Tristan Tarrant added a comment - 2024/11/27 2:38 PM

Expand comment: Tristan Tarrant added a comment - 2024/11/27 2:38 PM

Collapse comment: Dan Berindei (Inactive) added a comment - 2021/11/22 3:37 PM

Expand comment: Dan Berindei (Inactive) added a comment - 2021/11/22 3:37 PM

People

Dates

PagerDuty