-
Bug
-
Resolution: Done
-
Major
-
7.1.1.Final
-
None
When we retry RPCs on a SuspectException, we wait for a new cache topology before resending the RPC. Without the delay, the sender would entera very tight loop of invoking the RPC and immediately receiving a SuspectException.
But in certain edge cases, it's possible to receive a SuspectException without the suspected node ever being eliminated from the JGroups view (and without installing a new cache topology). That means the thread waiting to retry the RPC will block forever.
The problem is that RequestCorrelator listens to SUSPECT events directly. If the FD protocol then raises an UNSUSPECT event, GMS will not install a new view.
The solution should be to ignore SUSPECT events in RequestCorrelator, and only act on view changes.