Loading...

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XML

Word

Printable

Type: Task
Resolution: Done
Priority: Critical
Fix Version/s: 4.2.1.CR2, 4.2.1.FINAL
Affects Version/s: 4.0.0.Final, 4.1.0.BETA2
Component/s: None
Labels:
None

Estimated Difficulty:
High

We need to make sure that leave rehash process properly handles massive and rapid node failure.

Massive failures:
JGroups detects multiple node failures and pushes up to Infinispan views that are more "volatile" than we currently assumed (only one member at the time can leave). For example, if we have view V1=

{A,B,C,D,E}

and massive failure causes

{C,D,E}

to fail, JGroups failure detection and GMS are going to install a view V2=

{A,B}

to surviving members. LeaveTask does not handle this scenario.

Rapid node failure:
We need to revisit how LeaveTasks are queued up and executed/canceled during rapid node failures. Do we always cancel currently running leave tasks? At what stage are we allowed to cancel it and at what stage of a leave tasks is it better to wait for a completion of a task.

is blocked by

ISPN-902 Data consistency across rehashing

Resolved

is related to

ISPN-902 Data consistency across rehashing

Resolved

ISPN-914 RehashTask should not block on invalidating migrated entries

Closed

Assignee:: Manik Surtani (Inactive)

Reporter:: Vladimir Blagojevic (Inactive)

Archiver:: Amol Dongare

Created:: 2010/06/10 11:59 AM

Updated:: 2020/09/14 5:34 AM

Resolved:: 2011/02/16 2:12 PM

Archived:: 2024/11/28 6:21 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty