-
Bug
-
Resolution: Done
-
Major
-
9.4.23.Final, 11.0.11.Final, 12.1.7.Final, 13.0.0.Final
-
None
When ChannelFactory switches to an alternative cluster after all the servers are marked as failed (or, in older versions, once max-retries attempts have failed with a transport error), it increments topologyAge to prevent concurrent switching:
- Other threads deciding to switch clusters have an older age and give up.
- Topology updates from the old cluster also have an older age (the age sent in the Hot Rod request header), so they are ignored.
Switching to the initial server list is very similar to switching to another cluster, but it does not currently increment topologyAge. That makes it possible for multiple threads to try switching in parallel, sometimes reverting the latest topology update from the servers.
The impact is higher in versions that do not include the ISPN-12598 fix, because the repeated switching and closing of connections is enough to get one operation to exhaust max-retries and switch to the initial server list again.
- is incorporated by
-
ISPN-13264 Hot Rod client cluster switch never happens if max-retries < cluster size
- Closed
- is related to
-
ISPN-13216 Client should not close server connection after timeout
- Resolved
-
ISPN-12598 Hot Rod java client retries too many times
- Closed