If a site is down, then forwarding a message to it will block the thread (actually: wait) for max_forward_attempts * forward_sleep milliseconds.
If we have many incoming messages (from the local site) to be forwarded to the remote site, then having every thread sleep for (say) 10 seconds will increase the thread pool.
We should therefore add batching to RELAY2; messages to be forwarded are queued until a certain time has elapsed or the accumulated total size of all queued messages exceeds a certain threshold.
Incoming messages to be forwarded would be added to the queue (allowing the sending thread to be returned to the thread pool). A separate thread (or task) would monitor the queue and do the forwarding, and it would also notice that a site is down and send unreachable messages back to the original senders.