Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-1116

TCPPING used with port_range can cause random OutOfMemoryError

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Major Major
    • 2.4.8
    • 2.4.1 SP4
    • None
    • Workaround Exists
    • Hide

      Do not use port_range with TCPPING.

      Configure the TCP ports outside of the operating system's ephemeral range,
      and ensure TCPPING is not pinging any ports in the ephemeral range.

      Show
      Do not use port_range with TCPPING. Configure the TCP ports outside of the operating system's ephemeral range, and ensure TCPPING is not pinging any ports in the ephemeral range.

      When TCPPING is used with port_range, i.e there are more than one possible ports to bind to, jvm may run into random OOMs.

      The issue is with the local port bind in the ConnectionTable.java in JGroups-2.4.1-sp4.src/src/org/jgroups/blocks. Ping requests are send to all ports in the port range on all boxes in the cluster including the local box. When these requests try to connect to unsed ports in the range on the local box, a local bind is done in the getConnection method before a connect is called. This bind call may end up with a local port number which is the same as the unused port in the port range that the connection is being established for.

      This intern will allow the connect to go through even though there is no accept thread waiting on it.

         Connection getConnection(Address dest) throws Exception {
             Connection conn=null;
             Socket sock;
      
             synchronized(conns) {
                 conn=(Connection)conns.get(dest);
                 if(conn == null) {
                     // changed by bela Jan 18 2004: use the bind address for the client sockets as well
                     SocketAddress tmpBindAddr=new InetSocketAddress(bind_addr, 0);
                     InetAddress tmpDest=((IpAddress)dest).getIpAddress();
                     SocketAddress destAddr=new InetSocketAddress(tmpDest, ((IpAddress)dest).getPort());
                     sock=new Socket();
                     sock.bind(tmpBindAddr);
                     sock.setKeepAlive(true);
                     sock.setTcpNoDelay(tcp_nodelay);
                     if(linger > 0)
                         sock.setSoLinger(true, linger);
                     else
                         sock.setSoLinger(false, -1);
                     sock.connect(destAddr, sock_conn_timeout);
      

      This results in a connection where the local address is sent to the other side, but there is no accept thread to read it out.

                     conn=new Connection(sock, dest);
                     conn.sendLocalAddress(local_addr);
                     notifyConnectionOpened(dest);
                     // conns.put(dest, conn);
                     addConnection(dest, conn);
                     conn.init();
                     if(log.isInfoEnabled()) log.info("created socket to " + dest);
                 }
                 return conn;
             }
      

      When this value is not read out before the reader thread is started, it is eventually read in as the lenght to allocate for reading the packet in BasicConnectionTable.java in JGroups-2.4.1-sp4.src/src/org/jgroups/blocks.

            len=in.readInt();
            if(len > buf.length)
                buf=new byte[len];             
                in.readFully(buf, 0, len);
                updateLastAccessed();
                receive(peer_addr, buf, 0, len); // calls receiver.receive(msg)
      

      This in our case was allocating 1.6G of memory and sometimes would run out of memory in other parts of the program depending on how much memory was in use at that time.

      A test program that reproduces the port collision is attached. Sample invocation below.

      bash-2.05$ java Test vlinux101
      Connected : 6789 to 6789 on try 46267
      bash-2.05$

              rhn-engineering-bban Bela Ban
              e2open_jira Sanjay Prasad (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: