-
Enhancement
-
Resolution: Unresolved
-
Major
-
None
-
6.0.1.Final
If a hotrod client access a clustered replicated cache it will use a RoundRobin policy.
In case of a node failure, i.e. Garbage Collection or network issues, a cache operation will fail with a network timeout.
As this 'suspend' phase might take a bit before the node come back to work or get's dropped from the cluster, and the cluster-view of the client is updated, it will be used several times due to the round-robin according to the total number of existing cluster nodes.
In a two node cluster the effect is that each second call will wait for the configured timeout until the data is provided by the existing node. Which is a huge performance drawback.
To improve the performance in such cases a failed connection should expelled from the round robin for a shunning-period or until a new cluster-view is provided from the server side.
To improve the behaviour automatically there should be a default timeout according to the default JGroups detection (i.e. 10sec).
It should be able to customize this time by a hotrod property.