Uploaded image for project: 'Infinispan'
  1. Infinispan
  2. ISPN-9801

ClusterTopologyManagerImpl hangs when restarting a node with FORK

    Details

      Description

      When a server is restarted with `kill -9` or similar, both the old node and the new one can be in the JGroups view for a while. Normally this shouldn't be a problem, but sometimes the new node doesn't receive the HeartBeatCommand and the coordinator cannot process any new view updates.

      14:29:19,981 INFO  (jgroups-12,Test-NodeA:[]) [CLUSTER] ISPN000094: Received new cluster view for channel FORKISPN: [Test-NodeA|4] (5) [Test-NodeA, Test-NodeB, Test-NodeC, Test-NodeD, Test-NodeE]
      14:29:19,982 TRACE (transport-thread-Test-NodeA-p4-t14:[ViewHandling]) [ClusterTopologyManagerImpl] Updating cluster members for all the caches. New list is [Test-NodeA, Test-NodeB, Test-NodeC, Test-NodeD, Test-NodeE]
      14:29:19,982 TRACE (transport-thread-Test-NodeA-p4-t14:[ViewHandling]) [JGroupsTransport] Test-NodeA sending request 9 to all: org.infinispan.topology.HeartBeatCommand@1163beb6
      14:29:19,986 TRACE (jgroups-6,Test-NodeA:[]) [JGroupsTransport] Test-NodeA received response for request 9 from Test-NodeC: SuccessfulResponse(null)
      14:29:19,987 TRACE (jgroups-9,Test-NodeA:[]) [JGroupsTransport] Test-NodeA received response for request 9 from Test-NodeD: SuccessfulResponse(null)
      14:29:20,032 TRACE (jgroups-6,Test-NodeE:[]) [TCP_NIO2] Test-NodeE: received message batch of 1 messages from Test-NodeA
      14:29:20,032 TRACE (jgroups-6,Test-NodeE:[]) [NAKACK2] Test-NodeE: message Test-NodeA::39 was added to queue (not yet server)
      14:29:20,054 TRACE (jgroups-6,Test-NodeE:[]) [NAKACK2] Test-NodeE: received Test-NodeA#38
      14:29:20,054 TRACE (jgroups-6,Test-NodeE:[]) [NAKACK2] Test-NodeE: delivering Test-NodeA#38
      # not actually delivered :)
      14:29:20,054 TRACE (jgroups-6,Test-NodeE:[]) [MFC] Test-NodeA used 5 credits, 1999995 remaining
      14:29:20,149 INFO  (ForkThread-1,ForkChannelRestartTest:[]) [CLUSTER] ISPN000094: Received new cluster view for channel FORKISPN: [Test-NodeA|4] (5) [Test-NodeA, Test-NodeB, Test-NodeC, Test-NodeD, Test-NodeE]
      14:29:21,119 DEBUG (testng-Test-1:[]) [ForkChannelRestartTest] Stopping channel Test-NodeB
      14:29:23,319 INFO  (VERIFY_SUSPECT.TimerThread-32,Test-NodeA:[]) [CLUSTER] ISPN000094: Received new cluster view for channel FORKISPN: [Test-NodeA|5] (4) [Test-NodeA, Test-NodeC, Test-NodeD, Test-NodeE]
      14:29:23,320 TRACE (remote-thread-Test-NodeA-p2-t1:[]) [MultiTargetRequest] Target Test-NodeB of request 9 left the cluster view
      

      So far, it looks like it's a JGroups bug similar to JGRP-2294, but we need to investigate further.

        Gliffy Diagrams

          Attachments

            Activity

              People

              • Assignee:
                dan.berindei Dan Berindei
                Reporter:
                dan.berindei Dan Berindei
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: