With TCPPING, if we have 10 servers defined in the list, we sequentially send a GET_MBRS_REQ to each. However, if that server is not reachable, we will timeout out on the socket connect call. Also, DNS lookup might take some time, so we might time out if we cannot contact all servers. Example: servers 1 - 10. 1-9 are down or not reachable, plus we have a slow DNS, 10 is running. So before we get to 10, the discovery will timeout and we will become a singleton node.
SOLUTION: use threads from the common (global) thread pool in JGroups to parallelize the sending of requests to all 10 servers.
- relates to
-
JGRP-703 TCPGOSSIP: parallelize discovery for multiple GossipRouters
- Resolved