Details
-
Bug
-
Resolution: Done
-
Major
-
10.0.0.Alpha1, 9.4.3.Final
-
None
-
Sprint 10.0.0.Alpha2, Sprint 10.0.0.Beta1, DataGrid Sprint #31, DataGrid Sprint #32, DataGrid Sprint #33, DataGrid Sprint #34, DataGrid Sprint #35, DataGrid Sprint #36, DataGrid Sprint #37, DataGrid Sprint #38, DataGrid Sprint #39
Description
When a server is restarted with `kill -9` or similar, both the old node and the new one can be in the JGroups view for a while. Normally this shouldn't be a problem, but sometimes the new node doesn't receive the HeartBeatCommand and the coordinator cannot process any new view updates.
14:29:19,981 INFO (jgroups-12,Test-NodeA:[]) [CLUSTER] ISPN000094: Received new cluster view for channel FORKISPN: [Test-NodeA|4] (5) [Test-NodeA, Test-NodeB, Test-NodeC, Test-NodeD, Test-NodeE] 14:29:19,982 TRACE (transport-thread-Test-NodeA-p4-t14:[ViewHandling]) [ClusterTopologyManagerImpl] Updating cluster members for all the caches. New list is [Test-NodeA, Test-NodeB, Test-NodeC, Test-NodeD, Test-NodeE] 14:29:19,982 TRACE (transport-thread-Test-NodeA-p4-t14:[ViewHandling]) [JGroupsTransport] Test-NodeA sending request 9 to all: org.infinispan.topology.HeartBeatCommand@1163beb6 14:29:19,986 TRACE (jgroups-6,Test-NodeA:[]) [JGroupsTransport] Test-NodeA received response for request 9 from Test-NodeC: SuccessfulResponse(null) 14:29:19,987 TRACE (jgroups-9,Test-NodeA:[]) [JGroupsTransport] Test-NodeA received response for request 9 from Test-NodeD: SuccessfulResponse(null) 14:29:20,032 TRACE (jgroups-6,Test-NodeE:[]) [TCP_NIO2] Test-NodeE: received message batch of 1 messages from Test-NodeA 14:29:20,032 TRACE (jgroups-6,Test-NodeE:[]) [NAKACK2] Test-NodeE: message Test-NodeA::39 was added to queue (not yet server) 14:29:20,054 TRACE (jgroups-6,Test-NodeE:[]) [NAKACK2] Test-NodeE: received Test-NodeA#38 14:29:20,054 TRACE (jgroups-6,Test-NodeE:[]) [NAKACK2] Test-NodeE: delivering Test-NodeA#38 # not actually delivered :) 14:29:20,054 TRACE (jgroups-6,Test-NodeE:[]) [MFC] Test-NodeA used 5 credits, 1999995 remaining 14:29:20,149 INFO (ForkThread-1,ForkChannelRestartTest:[]) [CLUSTER] ISPN000094: Received new cluster view for channel FORKISPN: [Test-NodeA|4] (5) [Test-NodeA, Test-NodeB, Test-NodeC, Test-NodeD, Test-NodeE] 14:29:21,119 DEBUG (testng-Test-1:[]) [ForkChannelRestartTest] Stopping channel Test-NodeB 14:29:23,319 INFO (VERIFY_SUSPECT.TimerThread-32,Test-NodeA:[]) [CLUSTER] ISPN000094: Received new cluster view for channel FORKISPN: [Test-NodeA|5] (4) [Test-NodeA, Test-NodeC, Test-NodeD, Test-NodeE] 14:29:23,320 TRACE (remote-thread-Test-NodeA-p2-t1:[]) [MultiTargetRequest] Target Test-NodeB of request 9 left the cluster view
So far, it looks like it's a JGroups bug similar to JGRP-2294, but we need to investigate further.