Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-364

When using TCP_NIO, starting two nodes at the same time causes one of the nodes not to join group

    XMLWordPrintable

Details

    • Bug
    • Resolution: Cannot Reproduce
    • Major
    • 2.6
    • 2.4
    • None

    Description

      I am testing a jgroups tcp_nio configuration using the draw demo.If I start up my 3 nodes one by one then everything works fine. However if I start up node 1, then attempt to start node 2 and 3 in parallel then only node 2 will work. Node 3 will be isolated and not see the other nodes and logs the following message:

      org.jgroups.protocols.pbcast.ClientGmsImpl join
      WARNING: join(192.158.70.200:7802) sent to 192.158.70.200:7800 timed out, retrying

      I am starting the draw demo like this;

      java -cp jgroups-all.jar:commons-logging.jar:concurrent.jar:jmxri.jar org.jgroups.demos.Draw -props test.xml

      Here is the configuration for one of my nodes:

      <config>
      <TCP_NIO
      bind_addr="192.158.70.200"
      recv_buf_size="20000000"
      send_buf_size="640000"
      loopback="false"
      discard_incompatible_packets="true"
      max_bundle_size="64000"
      max_bundle_timeout="30"
      use_incoming_packet_handler="true"
      use_outgoing_packet_handler="true"
      down_thread="false" up_thread="false"
      enable_bundling="true"
      start_port="7800"
      end_port="7800"
      use_send_queues="false"
      sock_conn_timeout="300" skip_suspected_members="true"

      />

      <MPING timeout="2000" num_initial_members="3" mcast_addr="229.6.7.8"

      bind_addr="192.158.70.200" down_thread="false" up_thread="false"/>

      <MERGE2 max_interval="100000"
      down_thread="false" up_thread="false" min_interval="20000"/>
      <FD_SOCK down_thread="false" up_thread="false"/>

      <VERIFY_SUSPECT timeout="1500" down_thread="false" up_thread="false"/>
      <pbcast.NAKACK max_xmit_size="60000"
      use_mcast_xmit="false" gc_lag="0"
      retransmit_timeout="300,600,1200,2400,4800"
      down_thread="true" up_thread="true"
      discard_delivered_msgs="true"/>
      <pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
      down_thread="false" up_thread="false"
      max_bytes="400000"/>
      <pbcast.GMS print_local_addr="true" join_timeout="3000"
      down_thread="true" up_thread="true"
      join_retry_timeout="2000" shun="true"
      view_bundling="true"/>
      <!-- <FC max_credits="2000000" down_thread="false" up_thread="false"
      min_threshold="0.10"/>
      <FRAG2 frag_size="60000" down_thread="false" up_thread="false"/> -->
      <pbcast.STATE_TRANSFER/>
      <!-- <pbcast.FLUSH down_thread="false" up_thread="false"/>-->
      </config>

      Node 2 and 3 have the same configuration except the port they bind to has been changed

      Attachments

        1. test.xml
          2 kB
        2. test1.bat
          0.8 kB
        3. test2.bat
          0.8 kB
        4. test2.xml
          2 kB
        5. test3.bat
          0.8 kB
        6. test3.xml
          2 kB

        Activity

          People

            smarlow1@redhat.com Scott Marlow
            mjtodd_jira Matthew Todd (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: