Uploaded image for project: 'JGroups'
  1. JGroups
  2. JGRP-2363

DNS Ping cannot lookup SRV record for service port

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Obsolete
    • Icon: Major Major
    • None
    • 4.0.20
    • None

      I've got a problem regarding getting service port in DNS_PING DNS lookup.
      It seems in my openshift environment the JNDI DNS lookup cannot query the
      correct SRV record from the openshift DNS server. Ref:

      https://github.com/jboss-openshift/openshift-ping/blob/1.2.1.Final/dns/src/main/java/org/openshift/ping/dns/GetServicePort.java#L48

      For example, here is the ping service:

      apiVersion: v1
      kind: Service
      metadata:
      annotations:
      description: The JGroups ping port for clustering.
      service.alpha.kubernetes.io/tolerate-unready-endpoints: 'true'
      labels:
      application: application0
      template: amq-broker-73-persistence-clustered
      xpaas: 1.4.16
      name: application0-ping
      spec:
      clusterIP: None
      publishNotReadyAddresses: true
      ports:

      port: 8888
      protocol: TCP
      name: jgroup-port
      targetPort: 8888
      selector:
      deploymentConfig: application0-amq

      After it is deployed I deployed a application pod
      with JGroups DNS_PING protocol loaded. The relevant
      jgroups xml part looks like this:

      <config> ... <openshift.DNS_PING timeout="3000" serviceName="application0-ping" /> ... </config>

      After my application pod is in running state, I checked the log
      and there is a warning message from DNS_PING:

      2019-07-22 04:16:59,600 INFO [org.openshift.ping.common.Utils] 3 attempt(s) with a 1000ms sleep to execute [GetServicePort] failed. Last failure was [java.lang.NullPointerException: null]
      2019-07-22 04:16:59,601 WARNING [org.jgroups.protocols.openshift.DNS_PING] No DNS SRV record found for service [application0-ping]

      After some debugging it turns out that the DNS lookup for the record by this name
      "_tcp.application0-ping" returned null.

      However if I logged into the application pod and do nslookup it will give me correct record:

      sh-5.0# nslookup -type=srv _tcp.application0-ping
      Server: 10.74.177.77
      Address: 10.74.177.77#53

      _tcp.application0-ping.default.svc.cluster.local service = 10 100 8888 44c84e52.application0-ping.default.svc.cluster.local.

      And you can get the full name from the record, which is

      _tcp.application0-ping.default.svc.cluster.local

      If I then pass the full qualified name into the application and it can query the SRV
      record successfully.

      I have no idea why my application can't query the record using the short form name (i.e. _tcp.application0-ping). Could it be some configuration issue for the DNS ping?

      My openshift env details are:

      oc v3.11.117
      kubernetes v1.11.0+d4cacc0
      features: Basic-Auth GSSAPI Kerberos SPNEGO

      and the java version used in pod:

      sh-5.0# java -version
      openjdk version "1.8.0_212"
      OpenJDK Runtime Environment (build 1.8.0_212-b04)
      OpenJDK 64-Bit Server VM (build 25.212-b04, mixed mode)

      and the base OS is fedora 30.

              rhn-engineering-bban Bela Ban
              gaohoward Howard Gao
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: