Loading...

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: 9.2.2.Final, 9.3.0.Final
Affects Version/s: 9.2.0.Final
Component/s: Core
Labels:
None

Sprint:
Sprint 9.3.0.Beta1
Git Pull Request:
https://github.com/infinispan/infinispan/pull/5866, https://github.com/infinispan/infinispan/pull/5905, https://github.com/infinispan/infinispan/pull/5934

PreferAvailabilityStrategy checks the size of the stable topology, and only considers cache topologies that are derived from the biggest topology (in size) when picking a post-merge topology.

Unfortunately, in some situations this algorithm fails pretty badly. If a node has a very long GC pause, when it comes back it will report the old topology and the old stable topology. If the rest of the cluster rebalanced, it now has both a smaller current topology and a smaller stable topology.

Furthermore, the stable topology is updated asynchronously, independent from the current topology. So even if there's a split and the minority partition installs a current topology with fewer members, it may take some time for its stable topology to be updated with fewer members. In fact, it appears that when a rebalance is not needed (e.g. because the partition has a single node), the stable topology is never updated!

incorporates

ISPN-9077 NullPointerException when trying to recover cache

Closed

is related to

JDG-1426 Data loss caused by a single node which had a long GC pause

Closed

Assignee:: Dan Berindei (Inactive)

Reporter:: Dan Berindei (Inactive)

Archiver:: Amol Dongare

Created:: 2018/03/19 1:40 PM

Updated:: 2024/07/15 8:36 AM

Resolved:: 2018/04/24 6:29 AM

Archived:: 2024/11/28 6:21 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty