Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1497

FD_SOCK server socket is never closed after network interruption

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 3.0.12, 3.2
    • 2.6.21
    • None
    • Hide

      Start two clustered instances.
      Interrupt the communication between them (pull the network plug, or simulate it using a firewall) for long enough to split the cluster apart.
      After TCP keepalive kicks in, the existing FD_SOCK TCP sockets between the instances should be closed on both sides.
      In EAP 5.1 and later, the default FD_SOCK ports are in the ranges 5420x, 5320x, 5790x.

      Before the patch, the server side of the FD_SOCK tcp connections is never closed until the JGroups channel is shut down, and every time the cluster splits and rejoins new connections are created.

      The default TCP keepalive time is a little over 2 hours on most operating systems. Reducing it in an OS dependent way can speed up this test.
      On Linux, it's controlled by
      /proc/sys/net/ipv4/tcp_keepalive_time
      /proc/sys/net/ipv4/tcp_keepalive_intvl
      /proc/sys/net/ipv4/tcp_keepalive_probes
      The socket should be closed at (keepalive_time + keepalive_invl * keepalive_probes), from the point the socket is created, if the connection is no longer valid.

      On Linux with instances running on 127.0.0.1 and 127.0.0.2 with -u 230.1.2.3, the network split can be simulated with:
      iptables -I INPUT -s 127.0.0.1 -d 127.0.0.2 -j DROP
      iptables -I INPUT -s 127.0.0.2 -d 127.0.0.1 -j DROP
      iptables -I INPUT -d 230.1.2.3 -j DROP

      And restored with:
      iptables -D INPUT 1
      iptables -D INPUT 1
      iptables -D INPUT 1

      Show
      Start two clustered instances. Interrupt the communication between them (pull the network plug, or simulate it using a firewall) for long enough to split the cluster apart. After TCP keepalive kicks in, the existing FD_SOCK TCP sockets between the instances should be closed on both sides. In EAP 5.1 and later, the default FD_SOCK ports are in the ranges 5420x, 5320x, 5790x. Before the patch, the server side of the FD_SOCK tcp connections is never closed until the JGroups channel is shut down, and every time the cluster splits and rejoins new connections are created. The default TCP keepalive time is a little over 2 hours on most operating systems. Reducing it in an OS dependent way can speed up this test. On Linux, it's controlled by /proc/sys/net/ipv4/tcp_keepalive_time /proc/sys/net/ipv4/tcp_keepalive_intvl /proc/sys/net/ipv4/tcp_keepalive_probes The socket should be closed at (keepalive_time + keepalive_invl * keepalive_probes), from the point the socket is created, if the connection is no longer valid. On Linux with instances running on 127.0.0.1 and 127.0.0.2 with -u 230.1.2.3, the network split can be simulated with: iptables -I INPUT -s 127.0.0.1 -d 127.0.0.2 -j DROP iptables -I INPUT -s 127.0.0.2 -d 127.0.0.1 -j DROP iptables -I INPUT -d 230.1.2.3 -j DROP And restored with: iptables -D INPUT 1 iptables -D INPUT 1 iptables -D INPUT 1

      The server side of the FD_SOCK socket is never closed after a network interruption during which the client side is closed.

      JGRP-195 added TCP keepalive to FD_SOCK, but only to the client side.

              rhn-support-dereed Dennis Reed
              rhn-support-dereed Dennis Reed
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: