-
Bug
-
Resolution: Done
-
Major
-
9.2.0.Final
-
None
During the execution of AvailabilityStrategyContext#updateTopologiesAfterMerge, it's necessary for a topology update to be sent to the cluster via ClusterTopologyManagerImpl#executeOnClusterAsync, which utilises the ASYNC_TRANSPORT_EXECUTOR, before a call is made to ConflictManager#resolveConflicts. This topology update is vital as it contains the topologyId which all of the conflict resolution RPCs depend on. If this topology update is not sent, then ConflictManager#resolveConflicts will eventually timeout as no progress can be made.
The problem is that during the entire execution of AvailabilityStrategyContext#doMergePartitions, an ASYNC_TRANSPORT_EXECUTOR thread is occupied. Therefore, when AvailabilityStrategyContext#updateTopologiesAfterMerge is called prior to conflict resolution it's possible that ALL threads are executing runnables that are waiting indefinitely on ConflictManager#resolveConflicts and therefore it's not possible to send the topology update.
As the number of caches increase the number of doMergePartition runnables on the ASYNC_TRANSPORT_EXECUTOR increases, consequently so does the likelihood of the executor's resources becoming exhausted.
- causes
-
ISPN-8706 HotRodMergeTest Timed out waiting for rebalancing to complete
- Closed
- relates to
-
ISPN-11291 MultipleCachesDuringConflictResolutionTest.testPartitionMergePolicy random failures
- Closed