Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-598

Simplify FLUSH

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Done
    • Icon: Major Major
    • 2.6
    • 2.4, 2.5
    • None

      #1 Superfluous FLUSH-COMPLETED phase ?
      --------------------------------------
      It seems we don't need the FLUSH-COMPLETED phase, it is sufficient for
      the FLUSH leader to do a START-FLUSH and wait for all FLUSH-OK
      messages. Why ? Because the reconciliation protocol will reconcile
      messages before the view installation (or state transfer).

      Example:

      {A,B,C}

      . E sends message M and then crashes. Only C receives
      M.

      • A does a START-FLUSH
      • B and C send a FLUSH-OK back to A. This means that all multicasts
        from B will be seen by A before B's FLUSH-OK is received, and all
        multicasts from C will be seen by A before C's FLUSH-OK is received by
        A. HOWEVER, C will not necessarily send M to A before sending the
        FLUSH-OK message. This is only done through the reconciliation
        protocol !
      • FLUSH-COMPLETED can be removed, the digest that's part of
        FLUSH-COMPLETED has to moved to FKLUSH-OK phase.
      • FLUSH-OK is unicast not multicast.

      #2 Superfluous STOP-FLUSH-OK phase ?
      ------------------------------------
      Why do we need the acks here ? JGroups guarantees the STOP-FLUSH
      message is delivered to all non-faulty members, so why wait fr the
      STOP-FLUSH-OK ?

      If we make STOP-FLUSH OOB (JGRP-337), then this should work ! We can
      get rid of STOP-FLUSH-OK !! Just send STOP-FLUSH as OOB multicast.
      Note: this requires the concurrent stack, so possibly check for
      concurrent stack. Throw an exception if concurrent stack is not
      present when FLUSH is used. So, these changes cannot be applied to
      2.4, only to 2.5 and 2.6.

      #3 Timeouts
      -----------

      • ClientGmsImpl: has a hard coded timeout of 4 secs
      • CoordGmsImpl also has a timeout for block()
      • If we need timeouts, we should get them from the FLUSH protocol !

      CoorsGmsImpl.startFlush() should not have a timeout, this should be
      in FLUSH itself. startFlush() should block forever, until it gets the ack

      #4 Spurious FLUSH-OK in the diagram
      -----------------------------------
      In the JOIN diagram, probably also in the state transfer diagram

            vblagoje Vladimir Blagojevic (Inactive)
            vblagoje Vladimir Blagojevic (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

              Created:
              Updated:
              Resolved: