Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-493

Harden rehash leave process

This issue belongs to an archived project. You can view it, but you can't modify it. Learn more

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Done
    • Icon: Critical Critical
    • 4.2.1.CR2, 4.2.1.FINAL
    • 4.0.0.Final, 4.1.0.BETA2
    • None
    • None
    • High

      We need to make sure that leave rehash process properly handles massive and rapid node failure.

      Massive failures:
      JGroups detects multiple node failures and pushes up to Infinispan views that are more "volatile" than we currently assumed (only one member at the time can leave). For example, if we have view V1=

      {A,B,C,D,E}

      and massive failure causes

      {C,D,E}

      to fail, JGroups failure detection and GMS are going to install a view V2=

      {A,B}

      to surviving members. LeaveTask does not handle this scenario.

      Rapid node failure:
      We need to revisit how LeaveTasks are queued up and executed/canceled during rapid node failures. Do we always cancel currently running leave tasks? At what stage are we allowed to cancel it and at what stage of a leave tasks is it better to wait for a completion of a task.

              manik_jira Manik Surtani (Inactive)
              vblagoje Vladimir Blagojevic (Inactive)
              Archiver:
              rhn-support-adongare Amol Dongare

                Created:
                Updated:
                Resolved:
                Archived: