During the execution of AvailabilityStrategyContext#updateTopologiesAfterMerge, it's necessary for a topology update to be sent to the cluster via ClusterTopologyManagerImpl#executeOnClusterAsync, which utilises the ASYNC_TRANSPORT_EXECUTOR, before a call is made to ConflictManager#resolveConflicts. This topology update is vital as it contains the topologyId which all of the conflict resolution RPCs depend on. If this topology update is not sent, then ConflictManager#resolveConflicts will eventually timeout as no progress can be made.
The problem is that during the entire execution of AvailabilityStrategyContext#doMergePartitions, an ASYNC_TRANSPORT_EXECUTOR thread is occupied. Therefore, when AvailabilityStrategyContext#updateTopologiesAfterMerge is called prior to conflict resolution it's possible that ALL threads are executing runnables that are waiting indefinitely on ConflictManager#resolveConflicts and therefore it's not possible to send the topology update.
As the number of caches increase the number of doMergePartition runnables on the ASYNC_TRANSPORT_EXECUTOR increases, consequently so does the likelihood of the executor's resources becoming exhausted.