-
Feature Request
-
Resolution: Done
-
Major
-
4.0.19
-
None
We want to discuss a change in the TCPPing module.
We plan to run wildfly instances in a container orchestration system. In the system which we want to use, the node discovery over multicast is not working.
The other solution is to use TCPPing with initial_hosts set. But now we have to solve the following problems:
- the initial_hosts property is not very dynamic
- the ip addresses will/can change if a container is restarted
- the host names are dynamically generated
At this point it seems the node discovery can not be done with TCPPing, at least not in an easy way.
The main problem: How to find out all running nodes for a server group?
Now we investigate our orchestration system and find a solution to solve the problem. Our orchestration system (and we think others will have this too) has an internal DNS service.
Over this service all containers for a dns name can be resolved with a nslookup request.
Example:
We have a scalable wildfly service. We name it "wildfly-server". If a container under this service is started then the container gets a host name like "wildfly-server-0" and a dynamic ip address.
After starting one or more container we can do a nslookup with the service name:
>nslookup wildfly-server
Name: wildfly-server
Address 1: 10.42.2.139 wildfly-server-1.wildfly-server
Address 2: 10.42.1.198 wildfly-server-0.wildfly-server
Address 3: 10.42.0.161 wildfly-server-2.wildfly-server
The service name has multiple A-Records registered. If an instance is started or stopped then the DNS Records are updated. Now we tried to use this service name for the initial_hosts property.
initial_hosts=wildfly-server[7600]
Sometimes it worked and sometimes it doesn't. The reason was that only the first InetAddress entry was used in the org.jgroups.util.Util class (method parseCommaDelimitedHosts). After we changed it a bit (see https://github.com/Sternwald-Systems/JGroups/commit/db0b899f9c67348a0cb073783aad34c2ab3bfb40 ) it worked as expected. What we do is to call InetAddress.getAllByName(host) and loop over the result array, instead of just using the first array element.
There is only one limitation if the domain mode with more than one server group is used. Here the same port offset for all servers of one server group must be set.
Conclusion
There are different orchestration systems available on the market. The worst case will be to write a custom discovery service for jgroups for each of them.
For instance for the kubernetes system there already exists such a service (jgroups-kubernetes).
But if an orchestration system already has an internal DNS service to resolve a dns name to get all running containers TCPPing (with our changes) could be used out of the box.
Additionally there is a second method in the org.jgroups.util.Util class called parseCommaDelimitedHosts2 which does nearly the same but for the TCPGossip protocol.
We think it would make sense to change this method too, otherwise there are different behaviors. I you don't mind, we would apply the changes to this method too, before creating a pull request.
It is also important to document this well so other people can find this information if they have the same problem.