-
Bug
-
Resolution: Done
-
Blocker
-
4.0.11
-
None
-
DNS_PING based service discovery now captures port values from SRV records
-
Using DNS_PING in Jgroups 4.0.11 and SRV records the port from the SRV record is being dropped (set to zero) and the default is used instead (7600).
I am using this Jgroups config:
<subsystem xmlns="urn:jboss:domain:jgroups:6.0"> <channels default="ee"> <channel name="ee" stack="tcp" cluster="ejb"/> </channels> <stacks> <stack name="tcp"> <transport type="TCP" socket-binding="jgroups-tcp"> <property name="external_addr">${env.EXTERNAL_ADDR}</property> </transport> <protocol type="dns.DNS_PING"> <property name="dns_query"> jgroups.${env.DNS_NAME}.svc.cluster.local </property> <property name="dns_record_type"> SRV </property> </protocol> <protocol type="MERGE3"/> <protocol type="FD_SOCK"/> <protocol type="FD_ALL"/> <protocol type="VERIFY_SUSPECT"/> <protocol type="pbcast.NAKACK2"/> <protocol type="UNICAST3"/> <protocol type="pbcast.STABLE"/> <protocol type="pbcast.GMS"/> <protocol type="MFC"/> <protocol type="FRAG3"/> </stack> </stacks> </subsystem>
I have these service discovery DNS entries
$ dig jgroups.dev.auth.example.com.svc.cluster.local SRV ; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.68.rc1.58.amzn1 <<>> jgroups.dev.auth.example.com.svc.cluster.local SRV ;; global options: +cmd ;; Got answer: ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 16690 ;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 0 ;; QUESTION SECTION: ;jgroups.dev.auth.example.com.svc.cluster.local. IN SRV ;; ANSWER SECTION: jgroups.dev.auth.example.com.svc.cluster.local. 10 IN SRV 1 1 32921 9ec82e3f-3a0e-4e30-b785-17879c63cd7d.jgroups.dev.auth.example.com.svc.cluster.local. jgroups.dev.auth.example.com.svc.cluster.local. 10 IN SRV 1 1 32923 60b5a820-9678-4bd2-84c6-00061a52bde0.jgroups.dev.auth.example.com.svc.cluster.local. jgroups.dev.auth.example.com.svc.cluster.local. 10 IN SRV 1 1 32915 9d9d78d0-8919-4b91-9df8-2e4e65afedae.jgroups.dev.auth.example.com.svc.cluster.local. jgroups.dev.auth.example.com.svc.cluster.local. 10 IN SRV 1 1 32917 161f3d66-f1e3-46f4-a44f-ebda925a25c6.jgroups.dev.auth.example.com.svc.cluster.local. ;; Query time: 2 msec ;; SERVER: 10.42.3.2#53(10.42.3.2) ;; WHEN: Fri Sep 21 01:45:44 2018 ;; MSG SIZE rcvd: 481
But I get this in the logs when running Keycloak in standalone cluster:
17:45:10,121 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-3,null,null) Performing initial discovery 17:45:10,154 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-3,null,null) Entries collected from DNS: [10.42.3.56:0, 10.42.3.56:0, 10.42.3.44:0, 10.42.3.44:0] 17:45:10,155 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-3,null,null) Discovered IP Address with port 0 (10.42.3.56:0). Replacing with default Transport port: 7600 17:45:10,159 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-3,null,null) Discovered IP Address with port 0 (10.42.3.56:0). Replacing with default Transport port: 7600 17:45:10,159 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-3,null,null) Discovered IP Address with port 0 (10.42.3.44:0). Replacing with default Transport port: 7600 17:45:10,159 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-3,null,null) Discovered IP Address with port 0 (10.42.3.44:0). Replacing with default Transport port: 7600 17:45:10,159 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-3,null,null) Performing discovery of the following hosts [10.42.3.56:7600, 10.42.3.44:7600, e200a617bf7a] 17:45:10,159 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-3,null,null) e200a617bf7a: sending discovery request to 10.42.3.56:7600 17:45:10,160 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-3,null,null) e200a617bf7a: sending discovery request to 10.42.3.44:7600 17:45:10,160 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-10,ejb,e200a617bf7a) Received discovery from: e200a617bf7a, IP: 10.42.3.44:7600 17:45:10,161 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-3,null,null) e200a617bf7a: sending discovery request to e200a617bf7a 17:45:10,162 DEBUG [org.jgroups.protocols.dns.DNS_PING] (thread-11,ejb,e200a617bf7a) Received discovery from: e200a617bf7a, IP: 10.42.3.44:7600
As you can see it is resolving the DNS addresses, but discarding the ports.
To be clear, in this example 32923 ids the port (eg:
1 1 32923 60b5a820-9678-4bd2-84c6-00061a52bde0.jgroups.dev.auth.example.com.svc.cluster.local).
These are dynamic ports mapped to port 7600 in order to put more Keycloak containers on each instance.
$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES f67e39f8f403 datadog/agent:latest-jmx "/init" 8 hours ago Up 8 hours (healthy) 8125/udp, 8126/tcp ecs-auth-service-dev-26-datadog-agent-a2b7f783ddd0ba9cf601 bbb12f0c43a5 233747045000.dkr.ecr.us-east-2.amazonaws.com/ops/keycloak:latest "/opt/jboss/tools/do…" 8 hours ago Up 8 hours 0.0.0.0:32923->7600/tcp, 0.0.0.0:32922->8080/tcp ecs-auth-service-dev-26-keycloak-f4bd8f8dca9fd4cd4f00 932cad7c4fb9 datadog/agent:latest-jmx "/init" 8 hours ago Up 8 hours (healthy) 8125/udp, 8126/tcp ecs-auth-service-dev-26-datadog-agent-baa38a98ccaddea6f501 e200a617bf7a 233747045000.dkr.ecr.us-east-2.amazonaws.com/ops/keycloak:latest "/opt/jboss/tools/do…" 8 hours ago Up 8 hours 0.0.0.0:32921->7600/tcp, 0.0.0.0:32920->8080/tcp ecs-auth-service-dev-26-keycloak-e6f398e6cc8db5b5f101 73bc0b863c73 amazon/amazon-ecs-agent:latest "/agent" 2 days ago Up 2 days ecs-agent
This seems like it might be where ports are getting lost:
https://github.com/belaban/JGroups/blob/07060c3ba6e52ad4aad3ac799c2bc95ffd2fe7ff/src/org/jgroups/protocols/dns/DefaultDNSResolver.java#L84
I don't see the port number being extracted from the SRV entry and appended to the IP returned from resolveAEntries.
Let me know if I am missing any details. This is a major blocker for development.