Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-2463

TransferQueueBundler: Message to stopped node blocks the bundler thread

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 4.2.2, 5.0.0.Alpha4
    • 4.2.1
    • None

      TransferQueueBundler sends all the messages from a single thread. When one of the TP.doSend() calls blocks, the bundler thread no longer makes any progress, and it doesn't send messages to any destination, even if TP.doSend() is only slow for one particular destination.

      One example is when sending a message to a stopped node, e.g. the coordinator sending a LEAVE_RSP after the leaver has already stopped. The bundler thread calls TP.doSend(), the connection no longer exists, so it ends up calling BaseServer.createConnection(). If the stopped node's machine is no longer up or it is configured to drop messages to closed ports, the connection open blocks the bundler thread for TCP.sock_conn_timeout(default: 2s).

      UNICAST3 also retransmits the highest sent message every UNICAST3.xmit_interval (default: 500ms), for UNICAST3.max_retransmit_time(default: 1 min), so the bundler thread will block more than once for the same message.

      I assume the bundler thread will also block if the transport is TCP, one of the destinations is overloaded, and the TCP connection's send buffer is full. Normally applications try to spread the workload evenly among members, but e.g. with RELAY2 not all the members will be site masters.

              rhn-engineering-bban Bela Ban
              dberinde@redhat.com Dan Berindei (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: