-
Bug
-
Resolution: Done
-
Major
-
9.4.0.Final
-
Sprint 9.4.0.CR3
Test org.infinispan.xsite.statetransfer.failures.RetryMechanismTest.clearContent has been running for more than 300 seconds. Interrupting the test thread and dumping thread stacks of the test suite process and its children. Test org.infinispan.xsite.CacheOperationsTest.destroy has been running for more than 300 seconds. Interrupting the test thread and dumping thread stacks of the test suite process and its children. ... Killed processes 16913 The forked VM terminated without properly saying goodbye. VM crash or System.exit called? Error occurred in starting fork, check output in log Process Exit Code: 143 Crashed tests: org.infinispan.eviction.impl.ExceptionEvictionTest org.infinispan.statetransfer.ClusterTopologyManagerTest org.infinispan.stream.LocalStreamOffHeapTest
The timeouts are very likely caused by the JGRP-2277 changes. Most of our tests run without any FD* protocol to avoid creating an extra socket + thread, so when the coordinator leaves, the 2nd node must receive the leave message from the coordinator or it will never install a view with itself as the coordinator.
This dependency still existed before JGRP-2277, but it appears the view message sent by the coordinator before leaving was somehow more likely to reach the 2nd node than the new leave message.
The "crashed tests" list only includes tests that we know take a very long time to run, so I am assuming that they're not relevant. And unfortunately the mechanism to interrupt long tests still isn't working as it should, the thread dumps are not included in the artifacts.
- is related to
-
JGRP-2277 GMS: change the way a coordinator leaves gracefully
- Resolved