Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-2504

Poor throughput over high latency TCP connection when recv_buf_size is configured

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done
    • Minor
    • 5.1
    • 5.0.0.Final
    • None
    • False
    • False
    • Undefined
    • Hide

      I included a test program based on the SimpleChat jgroups example. (I am not a Java developer, so please excuse any idiosyncrasy of the code...)

      • Create two physically distant Linux servers. I used two newly built CentOS 8 Linodes, one in Fremont, CA, and the other in Newark, NJ. Ping time between the servers is ~65 milliseconds.
      • Configure net.core.rmem_max and net.core_wmem_max to something large such as 32MB:
        sudo sysctl -w net.core.rmem_max=33554432 net.core.wmem_max=33554432
      • Copy the following files to both servers:
        • SpeedTest.class
        • jgroups-5.0.0.Final.jar
        • tcp-5.0.0.xml (a copy of tcp.xml in the jgroups-5.0.0.Final.jar)
      • Configure send_buf_size and recv_buf_size in tcp-5.0.0.xml:
         <TCP...
            send_buf_size="33554432"
            recv_buf_size="33554432"/>
      • Run SpeedTest on both machines and wait for them to connect:
        java -Djgroups.tcpping.initial_hosts=jgroups-west[7800],jgroups-east[7800] -cp jgroups-5.0.0.Final.jar:. SpeedTest
      • On either machine, enter the command "send" or "recv" to have that machine receive (or send) 16MB and output the estimated throughput in bytes/sec. One of these will be significantly slower than the other, and will correspond to data sent from client (connect side) to server (listen side)

      Example:

      [jgroups@jgroups-west ~]$ java -Djgroups.tcpping.initial_hosts=jgroups-west[7800],jgroups-east[7800] -cp jgroups-5.0.0.Final.jar:. SpeedTest
      Sep 29, 2020 6:16:59 PM org.jgroups.JChannel setAddress
      INFO: local_addr: d1845247-6de6-d80a-14cd-78524a0925fe, name: jgroups-west-24449
      
      -------------------------------------------------------------------
      GMS: address=jgroups-west-24449, cluster=SpeedTestCluster, physical address=45.79.68.10:7800
      -------------------------------------------------------------------
      ** view: [jgroups-east-34095|1] (2) [jgroups-east-34095, jgroups-west-24449]
      
      === NOTE - This instance is currently the connect() side ===
      
      > send
      Sending...
      > Sent 16777216 bytes at 2699498 bytes/sec
      recv
      Receiving...
      > Received 16777216 bytes at 15127927 bytes/sec
      
      === NOTE - Stopped and restarted the remote side ===
      
      > ** view: [jgroups-west-24449|2] (1) [jgroups-west-24449]
      ** view: [jgroups-west-24449|3] (2) [jgroups-west-24449, jgroups-east-47558]
      
      === NOTE - This instance is now the listen() side ===
      
      > send
      Sending...
      > Sent 16777216 bytes at 14863557 bytes/sec
      recv
      Receiving...
      > Received 16777216 bytes at 2626508 bytes/sec
      Show
      I included a test program based on the SimpleChat jgroups example. (I am not a Java developer, so please excuse any idiosyncrasy of the code...) Create two physically distant Linux servers. I used two newly built CentOS 8 Linodes, one in Fremont, CA, and the other in Newark, NJ. Ping time between the servers is ~65 milliseconds. Configure net.core.rmem_max and net.core_wmem_max to something large such as 32MB: sudo sysctl -w net.core.rmem_max=33554432 net.core.wmem_max=33554432 Copy the following files to both servers: SpeedTest.class jgroups-5.0.0.Final.jar tcp-5.0.0.xml (a copy of tcp.xml in the jgroups-5.0.0.Final.jar) Configure send_buf_size and recv_buf_size in tcp-5.0.0.xml: <TCP... send_buf_size= "33554432" recv_buf_size= "33554432" /> Run SpeedTest on both machines and wait for them to connect: java -Djgroups.tcpping.initial_hosts=jgroups-west[7800],jgroups-east[7800] -cp jgroups-5.0.0.Final.jar:. SpeedTest On either machine, enter the command "send" or "recv" to have that machine receive (or send) 16MB and output the estimated throughput in bytes/sec. One of these will be significantly slower than the other, and will correspond to data sent from client (connect side) to server (listen side) Example: [jgroups@jgroups-west ~]$ java -Djgroups.tcpping.initial_hosts=jgroups-west[7800],jgroups-east[7800] -cp jgroups-5.0.0.Final.jar:. SpeedTest Sep 29, 2020 6:16:59 PM org.jgroups.JChannel setAddress INFO: local_addr: d1845247-6de6-d80a-14cd-78524a0925fe, name: jgroups-west-24449 ------------------------------------------------------------------- GMS: address=jgroups-west-24449, cluster=SpeedTestCluster, physical address=45.79.68.10:7800 ------------------------------------------------------------------- ** view: [jgroups-east-34095|1] (2) [jgroups-east-34095, jgroups-west-24449] === NOTE - This instance is currently the connect() side === > send Sending... > Sent 16777216 bytes at 2699498 bytes/sec recv Receiving... > Received 16777216 bytes at 15127927 bytes/sec === NOTE - Stopped and restarted the remote side === > ** view: [jgroups-west-24449|2] (1) [jgroups-west-24449] ** view: [jgroups-west-24449|3] (2) [jgroups-west-24449, jgroups-east-47558] === NOTE - This instance is now the listen() side === > send Sending... > Sent 16777216 bytes at 14863557 bytes/sec recv Receiving... > Received 16777216 bytes at 2626508 bytes/sec

    Description

      I recently finished troubleshooting a unidirectional throughput bottleneck involving a JGroups application (Infinispan) communicating over a high-latency (~45 milliseconds) TCP connection.

      The root cause was JGroups improperly configuring the receive/send buffers on the listening socket. According to the tcp(7) man page:

      On individual connections, the socket buffer size must be set prior to
      the listen(2) or connect(2) calls in order to have it take effect.
      

      However, JGroups does not set the buffer size on the listening side until after accept().

      The result is poor throughput when sending data from client (connecting side) to server (listening side.) Because the issue is a too-small TCP receive window, throughput is ultimately latency-bound.

      Attachments

        1. bla5.java
          1 kB
        2. bla6.java
          2 kB
        3. bla7.java
          2 kB
        4. delay-ip.sh
          0.6 kB
        5. rcvbuf.png
          rcvbuf.png
          72 kB
        6. SpeedTest.java
          4 kB

        Activity

          People

            rhn-engineering-bban Bela Ban
            g-41394b97-fafa-4748-a281-6f88e12c80fa Andrew Skalski (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: