With version 2.13 installed on OCP 4.12 the default APICast resolv.conf is as follow
search atra-3scale.svc.cluster.local svc.cluster.local cluster.local shrocp4upi413ovn.lab.upshift.rdu2.redhat.com nameserver 172.30.0.10 options ndots:5
From the log I can see that APICast parsed the file correctly
2023/10/04 05:08:24 [debug] 15#15: *2 [lua] resolver.lua:136: parse_nameservers(): search atra-3scale.svc.cluster.local svc.cluster.local cluster.local shrocp4upi413ovn.lab.upshift.rdu2.redhat.co ↳ m 2023/10/04 05:08:24 [debug] 15#15: *2 [lua] resolver.lua:140: parse_nameservers(): search domain: atra-3scale.svc.cluster.local 2023/10/04 05:08:24 [debug] 15#15: *2 [lua] resolver.lua:140: parse_nameservers(): search domain: svc.cluster.local 2023/10/04 05:08:24 [debug] 15#15: *2 [lua] resolver.lua:140: parse_nameservers(): search domain: cluster.local 2023/10/04 05:08:24 [debug] 15#15: *2 [lua] resolver.lua:140: parse_nameservers(): search domain: shrocp4upi413ovn.lab.upshift.rdu2.redhat.com
According to the resolv.conf I would expect APIcast will query with the following order
system-master.atra-3scale.svc.cluster.local system-master.svc.cluster.local system-master.cluster.local system-master.shrocp4upi413ovn.lab.upshift.rdu2.redhat.com system-master.
However, APICast automatically adds a . (root zone indentifier) to the first query to make the domain an FQDN one without event first going through the search path. In my case system-master.
2023/10/04 05:08:25 [debug] 26#26: *2 [lua] resolver.lua:321: search_dns(): resolver query: system-master search: query: system-master. 2023/10/04 05:08:25 [debug] 26#26: *2 [lua] cache.lua:122: fetch_answers(): resolver cache miss system-master. 2023/10/04 05:08:25 [debug] 26#26: *2 [lua] cache.lua:188: get(): resolver cache miss: system-master. 2023/10/04 05:08:25 [debug] 26#26: *2 [lua] dns_client.lua:50: init_resolvers(): initializing 1 nameservers 2023/10/04 05:08:25 [debug] 26#26: *2 [lua] dns_client.lua:63: init_resolvers(): nameserver 172.30.0.10:53 initialized 2023/10/04 05:08:25 [debug] 26#26: *2 [lua] dns_client.lua:75: query(): resolver query: system-master. nameserver: 172.30.0.10:53 2023/10/04 05:08:25 [debug] 26#26: *2 [lua] resolver.lua:321: search_dns(): resolver query: system-master search: atra-3scale.svc.cluster.local query: system-master.atra-3scale.svc.cluster.local
Digging through the code, I can see that the first entry in the search is an empty string. This results in the first query always having a (.) added
And due to this, the pod were failed to resolve in the OCP DNS, APIcast will always need to perform an additional query
2023/10/04 05:08:25 [debug] 26#26: *2 [lua] resolver.lua:321: search_dns(): resolver query: system-master search: atra-3scale.svc.cluster.local query: system-master.atra-3scale.svc.cluster.local
And depending on the current cluster DNS configuration, the query will be forwarded to the next DNS server which sometime cause significant delay (10-20 seconds)
The current workaround is to replace the shortname with FQDN:
Can someone on the engineering team confirm whether adding (.) to the first search query is the desired behavior?
Reference:
The empty string was added to search scope in this commit:
https://github.com/3scale/APIcast/commit/1e47e2d5bbdb1ee186dab0ea5da4fbf144092bd9
Related tickets:
- relates to
-
THREESCALE-10218 Allow internal service name in PROXY_CONFIGS_ENDPOINT and BACKEND_ENDPOINT_OVERRIDE
- Defined
-
THREESCALE-9301 fix dns cache miss
- To Test (QE)