When running an ECS cluster with jboss/keycloak:latest containers dynamic port mapping of all ports is required to allow more than one container to run per EC2 instance. Using SRV based service discovery records will allow each node to find the rest of the nodes, but when a discovery request is sent the receiving node sees the sender as IP:7600 instead of the dynamic port. It then sees this as a "new" node and tries to send discovery requests to it. And somehow it is also getting node IDs and trying to send requests to those!
See the following log, there are only 4 actual nodes and the each have a different 5 digit port number:
### Service discovery with dynamic port mapping
2018-10-10 20:17:44,178 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) Performing discovery of the following hosts [10.42.3.44:7600, 10.42.3.56:32949, 10.42.3.56:32951, 10.42.3.44:32954, c5b479b7b6d5, 10.42.3.44:32952, 10.42.3.56:7600, 17081c624290, 63976b7fae70, 557cbd7891a2]
2018-10-10 20:17:44,178 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.44:7600
2018-10-10 20:17:44,179 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.56:32949
2018-10-10 20:17:44,179 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.56:32951
2018-10-10 20:17:44,180 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.44:32954
2018-10-10 20:17:44,181 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to c5b479b7b6d5
2018-10-10 20:17:44,181 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.44:32952
2018-10-10 20:17:44,181 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.56:7600
2018-10-10 20:17:44,182 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 17081c624290, IP: 10.42.3.56:7600
2018-10-10 20:17:44,182 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 17081c624290
2018-10-10 20:17:44,182 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-238,ejb,17081c624290) Received discovery from: 17081c624290, IP: 10.42.3.56:7600
2018-10-10 20:17:44,182 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 63976b7fae70
2018-10-10 20:17:44,183 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 557cbd7891a2
2018-10-10 20:17:44,187 WARN [org.jgroups.protocols.TCP] (TQ-Bundler-7,ejb,17081c624290) JGRP000032: 17081c624290: no physical address for c5b479b7b6d5, dropping message
This code seems to be part of the problem in this case: https://github.com/belaban/JGroups/blob/87d15ec848aa3d482ae792ef152f7e36e1ab625c/src/org/jgroups/protocols/dns/DNS_PING.java#L109
See that code uses the incoming address and adds it to the discocvered_hosts, but those addresses are ALWAYS inaccurate in this case.
Because this is what the recipient of the service discovery request sees (ie: all the ports are the default 7600):
2018-10-10 20:35:15,229 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600 2018-10-10 20:35:15,231 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600 2018-10-10 20:35:15,232 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-397,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600 2018-10-10 20:35:15,233 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600 2018-10-10 20:35:17,234 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600 2018-10-10 20:35:17,236 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600 2018-10-10 20:35:17,238 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-397,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600 2018-10-10 20:35:17,238 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600 2018-10-10 20:35:19,239 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600 2018-10-10 20:35:19,240 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600 2018-10-10 20:35:19,242 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600 2018-10-10 20:35:19,243 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600 2018-10-10 20:35:21,246 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600 2018-10-10 20:35:21,247 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600 2018-10-10 20:35:21,253 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600 2018-10-10 20:35:21,253 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600 2018-10-10 20:35:23,247 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600 2018-10-10 20:35:23,249 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600 2018-10-10 20:35:23,251 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600 2018-10-10 20:35:23,251 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600 2018-10-10 20:35:25,252 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600 2018-10-10 20:35:25,253 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600 2018-10-10 20:35:25,255 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600 2018-10-10 20:35:25,256 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
In this state the cluster never seems to work properly and the Keycloak interface breaks in many frustrating ways.
- duplicates
-
JGRP-2316 DNS_PING is not using correct ports with SRV based service discovery
- Resolved