Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-2684

A stack without UNICAST3 or NAKACK2 loses messages on concurrent connection establishment

XMLWordPrintable

    • Icon: Task Task
    • Resolution: Done
    • Icon: Major Major
    • 5.2.15
    • None
    • None
    • False
    • None
    • False

      When we have a stack without UNICAST3 or NAKACK2, TCP must not lose any messages, or else there will be gaps in the delivered message sequence, see [1].

      However, when we have a concurrent connection establishment, e.g. A connecting to B, and vice versa at the same time, then messages can get lost.

      This is not an issue when UNICAST3 or NAKACK2 are present, but with their absence, the following can happen (see [2] for details):

      • A sends messages [1..5] to B
      • At the same time B sends messages [1..5] to A
      • A establishes a connection to B and sends messages A1 and A2 to B
      • B establishes a connection to A
      • B accepts A's connection and compares the addresses
        • B's address is higher and wins: A closes its connection to B and replaces it with the accepted connection
      • A sends messages A3-A5 to B
        --> This leads to messages A1 and A2 being lost, as there is no retransmission!

      A possible fix:

      • When A receives B's connection, it needs to
        • Replace the existing connection to B with B's accepted connection (TcpConnection)
        • As the lock on this is held, no other thread will be able to use the new connection yet
        • Flush the old connection -> this will send A1 and A2 to B on the old connection
        • Once the thread returns (releasing the this lock), messages A3-A5 can be sent to B on the new connection

      [1] https://issues.redhat.com/browse/JGRP-2566
      [2] https://issues.redhat.com/browse/JGRP-549

              rhn-engineering-bban Bela Ban
              rhn-engineering-bban Bela Ban
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: