-
Bug
-
Resolution: Done
-
Major
-
5.2.0.Final
Some tests fail randomly with a timeout waiting for a new view after stopping the coordinator:
01:08:11,695 ERROR (testng-CacheClusterJoinTest:) [UnitTestTestNGListener] Test testIsCoordinator(org.infinispan.api.CacheClusterJoinTest) failed. java.lang.RuntimeException: Timed out before caches had complete views. Expected 1 members in each view. Views are as follows: [[NodeC-27739, NodeD-5092]] at org.infinispan.test.TestingUtil.viewsTimedOut(TestingUtil.java:249) at org.infinispan.test.TestingUtil.blockUntilViewsReceived(TestingUtil.java:311) at org.infinispan.api.CacheClusterJoinTest.testIsCoordinator(CacheClusterJoinTest.java:87)
This happens because the old coordinator tries to install a new view without it before stopping, but fails:
01:07:21,616 WARN (ViewHandler,ISPN,NodeC-27739:) [GMS] NodeC-27739: failed to collect all ACKs (expected=1) for view [NodeD-5092|2] after 2000ms, missing ACKs from [NodeD-5092]
The survivor never received the view installation message, so it didn't install view [NodeD-5092|2]. Because it didn't have any failure detection, it couldn't realize that the current coordinator was dead so it never installed a new view.
It's not clear why the survivor didn't receive the view message at all in the test suite, but this can obviously happen so we should enable FD_SOCK in the test suite.