-
Bug
-
Resolution: Done
-
Major
-
5.0.0.CR4
-
None
Sometimes during restart of 3 or more HotRod nodes from 25-node cluster, I receive replication timeout exception, after which the node is unusable.
The timeout comes from replacing the view in HotrodServer.addSelfToTopologyView. If 3 nodes try to replace the same element in cache at the same time, it's not a big surprise, that they fall into some kind of deadlock, which is properly recognized and broken after the timeout. But unfortunately the breaking exception is not handled and stops the HotRodServer start procedure. I suggest to catch it in addSelfToTopologyView like this:
var updated = false
try
catch
{ case e: TimeoutException => logUnableToReplaceView }This time the exception will not be thrown from the containing closure and updateTopologyView method will have the chance to replace the view again.
- is related to
-
ISPN-448 Consider all topology cache updates to be done by coordinator in Hot Rod
- Closed