Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-979

TCP DataOutputStream.flush() hang

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • 2.6.10.merge
    • 2.6.10.merge
    • None

      JGroups cluster consist of 2 nodes. It uses 2 JChannel: one for config purposes, another for data transfer. Sometimes randomly servers hangs up trying to send message to cluster. I think the main reason is this:

      "Timer-3,tcp,81.19.94.71:7800" daemon prio=10 tid=0x00007f155c7a1400 nid=0x3eec runnable [0x0000000042f92000..0x0000000042f92c80]
      java.lang.Thread.State: RUNNABLE
      at java.net.SocketOutputStream.socketWrite0(Native Method)
      at java.net.SocketOutputStream.socketWrite(SocketOutputStream.java:109)
      at java.net.SocketOutputStream.write(SocketOutputStream.java:153)
      at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
      at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)

      • locked <0x00007f15688678f0> (a java.io.BufferedOutputStream)
        at java.io.DataOutputStream.flush(DataOutputStream.java:123)
        at org.jgroups.blocks.BasicConnectionTable$Connection.doSend(BasicConnectionTable.java:546)
        at org.jgroups.blocks.BasicConnectionTable$Connection._send(BasicConnectionTable.java:522)
        at org.jgroups.blocks.BasicConnectionTable$Connection.send(BasicConnectionTable.java:506)
        at org.jgroups.blocks.BasicConnectionTable.send(BasicConnectionTable.java:322)
        at org.jgroups.protocols.TCP.send(TCP.java:55)
        at org.jgroups.protocols.BasicTCP.sendToSingleMember(BasicTCP.java:219)
        at org.jgroups.protocols.BasicTCP.sendToAllMembers(BasicTCP.java:204)
        at org.jgroups.protocols.TP.doSend(TP.java:1486)
        at org.jgroups.protocols.TP.access$2500(TP.java:49)
        at org.jgroups.protocols.TP$Bundler.sendBundledMessages(TP.java:2059)
        at org.jgroups.protocols.TP$Bundler.access$2900(TP.java:1951)
        at org.jgroups.protocols.TP$Bundler$BundlingTimer.run(TP.java:2088)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$101(ScheduledThreadPoolExecutor.java:165)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:266)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:636)

      I've done few thread dumps during 30 minutes and this thread was in this state in every dump.

      See full thread dump in attachment.

      P.S. In log file OpenJDK used, but switch to Sun JDK result in the same errors.
      $ java -version
      java version "1.6.0_0"
      IcedTea6 1.3.1 (6b12-0ubuntu6.4) Runtime Environment (build 1.6.0_0-b12)
      OpenJDK 64-Bit Server VM (build 1.6.0_0-b12, mixed mode)

        1. flush-tcp.xml
          2 kB
          Bulat Nigmatullin
        2. nohup_2_node_cluster.tar.gz
          80 kB
          Bulat Nigmatullin
        3. nohup.out
          579 kB
          Bulat Nigmatullin
        4. nohup1.out
          685 kB
          Bulat Nigmatullin

              rhn-engineering-bban Bela Ban
              bulat_jira Bulat Nigmatullin (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

                Created:
                Updated:
                Resolved: