Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-2300

DNS_PING in AWS ECS cannot cluster with dynamic port mappings

    Details

    • Type: Bug
    • Status: Closed (View Workflow)
    • Priority: Critical
    • Resolution: Duplicate Issue
    • Affects Version/s: 4.0.16
    • Fix Version/s: 4.0.16
    • Labels:
      None
    • Steps to Reproduce:
      Hide

      Set up an AWS ECS cluster with dynamic port mapping and service discovery.
      Run the latest jboss/keycloak container and allow the clustering port (7600) to be dynamically mapped.
      Turn on DEBUG logging.
      Watch the mayhem.

      Show
      Set up an AWS ECS cluster with dynamic port mapping and service discovery. Run the latest jboss/keycloak container and allow the clustering port (7600) to be dynamically mapped. Turn on DEBUG logging. Watch the mayhem.

      Description

      When running an ECS cluster with jboss/keycloak:latest containers dynamic port mapping of all ports is required to allow more than one container to run per EC2 instance. Using SRV based service discovery records will allow each node to find the rest of the nodes, but when a discovery request is sent the receiving node sees the sender as IP:7600 instead of the dynamic port. It then sees this as a "new" node and tries to send discovery requests to it. And somehow it is also getting node IDs and trying to send requests to those!

      See the following log, there are only 4 actual nodes and the each have a different 5 digit port number:

      ### Service discovery with dynamic port mapping
      
      2018-10-10 20:17:44,178 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) Performing discovery of the following hosts [10.42.3.44:7600, 10.42.3.56:32949, 10.42.3.56:32951, 10.42.3.44:32954, c5b479b7b6d5, 10.42.3.44:32952, 10.42.3.56:7600, 17081c624290, 63976b7fae70, 557cbd7891a2]
      2018-10-10 20:17:44,178 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.44:7600
      2018-10-10 20:17:44,179 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.56:32949
      2018-10-10 20:17:44,179 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.56:32951
      2018-10-10 20:17:44,180 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.44:32954
      2018-10-10 20:17:44,181 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to c5b479b7b6d5
      2018-10-10 20:17:44,181 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.44:32952
      2018-10-10 20:17:44,181 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 10.42.3.56:7600
      2018-10-10 20:17:44,182 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 17081c624290, IP: 10.42.3.56:7600
      2018-10-10 20:17:44,182 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 17081c624290
      2018-10-10 20:17:44,182 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-238,ejb,17081c624290) Received discovery from: 17081c624290, IP: 10.42.3.56:7600
      2018-10-10 20:17:44,182 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 63976b7fae70
      2018-10-10 20:17:44,183 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-240,ejb,17081c624290) 17081c624290: sending discovery request to 557cbd7891a2
      2018-10-10 20:17:44,187 WARN  [org.jgroups.protocols.TCP] (TQ-Bundler-7,ejb,17081c624290) JGRP000032: 17081c624290: no physical address for c5b479b7b6d5, dropping message
      

      This code seems to be part of the problem in this case: https://github.com/belaban/JGroups/blob/87d15ec848aa3d482ae792ef152f7e36e1ab625c/src/org/jgroups/protocols/dns/DNS_PING.java#L109

      See that code uses the incoming address and adds it to the discocvered_hosts, but those addresses are ALWAYS inaccurate in this case.

      Because this is what the recipient of the service discovery request sees (ie: all the ports are the default 7600):

      2018-10-10 20:35:15,229 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
      2018-10-10 20:35:15,231 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
      2018-10-10 20:35:15,232 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-397,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
      2018-10-10 20:35:15,233 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
      2018-10-10 20:35:17,234 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
      2018-10-10 20:35:17,236 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
      2018-10-10 20:35:17,238 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-397,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
      2018-10-10 20:35:17,238 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
      2018-10-10 20:35:19,239 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
      2018-10-10 20:35:19,240 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
      2018-10-10 20:35:19,242 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
      2018-10-10 20:35:19,243 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
      2018-10-10 20:35:21,246 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
      2018-10-10 20:35:21,247 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
      2018-10-10 20:35:21,253 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
      2018-10-10 20:35:21,253 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
      2018-10-10 20:35:23,247 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
      2018-10-10 20:35:23,249 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
      2018-10-10 20:35:23,251 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
      2018-10-10 20:35:23,251 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-350,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
      2018-10-10 20:35:25,252 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
      2018-10-10 20:35:25,253 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
      2018-10-10 20:35:25,255 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
      2018-10-10 20:35:25,256 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-237,ejb,17081c624290) Received discovery from: 63976b7fae70, IP: 10.42.3.44:7600
      

      In this state the cluster never seems to work properly and the Keycloak interface breaks in many frustrating ways.

        Gliffy Diagrams

          Attachments

            Issue Links

              Activity

                People

                • Assignee:
                  sebastian.laskawiec Sebastian Laskawiec
                  Reporter:
                  ethompson Eric Thompson
                • Votes:
                  0 Vote for this issue
                  Watchers:
                  3 Start watching this issue

                  Dates

                  • Created:
                    Updated:
                    Resolved: