Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1644

NAKACK2 violates FIFO property

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Major Major
    • 3.4
    • 3.3.1
      1. Use NAKACK2
      2. Send numbered multicast messages (with dest == null )
      3. Follow the message numbers

      In the documentation documentation it is stated that:

      NAKACK provides reliable delivery and FIFO (= First In First Out) properties for messages sent to all nodes in a cluster.

      and

      NAKACK2 was introduced in 3.1 and is a successor to NAKACK (at some point it will replace NAKACK). It has the same properties as NAKACK, but its implementation is faster and uses less memory, plus it creates fewer tasks in the timer.

      I have observed that sometimes multicast messages are received out of order.
      We use the following protocol stack configuration:

      <config xmlns="urn:org:jgroups"
              xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
              xsi:schemaLocation="urn:org:jgroups http://www.jgroups.org/schema/JGroups-3.3.xsd">
          <UDP bind_addr="match-interface:$interface"
               bind_interface="$interface"
               bind_port="$unicastPort"
      
               ip_ttl="128"
      
               mcast_addr="$multicastGroup"
               mcast_port="$multicastPort"
      
               singleton_name="udp-transport"/>
      
          <PING return_entire_cache="true"
                break_on_coord_rsp="false"/>
      
          <MERGE3/>
      
          <FD_SOCK/>
      
          <FD_ALL/>
      
          <VERIFY_SUSPECT/>
      
          <BARRIER/>
      
          <pbcast.NAKACK print_stability_history_on_failed_xmit="true"/>
      
          <pbcast.STABLE/>
      
          <pbcast.GMS/>
      
          <MFC max_credits="8M"/>
      
          <FRAG2/>
      
          <RSVP/>
      </config>
      

      As you can see, mostly we use the defaults.
      The messages are being sent from a single thread using the following code:

      channel.send(new Message(null, msg))
      

      Each message has size from 300 KB up to 4 MB. The message rate is 1-5 messages per second.
      We have a sequential counter inside each message being sent. Sometimes the messages are received out of order, for instance:

      #1198
      #1199
      #1200
      #1202
      #1201
      #1203
      #1204
      

      If we replace NAKACK2 by NAKACK the problem disappears – everything works as expected (FIFO).
      If we replace JGroups-based transport by ZeroMQ-based transport (actually running over EPGM and being used for a year) everything works as expected (FIFO) – just to let you know, that there are no bugs in out message numbering logic.

            rhn-engineering-bban Bela Ban
            incubos_jira Vadim Tsesko (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

              Created:
              Updated:
              Resolved: