-
Feature Request
-
Resolution: Unresolved
-
Undefined
-
None
-
OpenShift 4.0, 4.16, 4.18, 4.17
-
None
-
None
-
Product / Portfolio Work
-
None
-
False
-
-
None
-
None
-
None
-
-
None
-
None
-
None
-
None
-
None
1. Proposed title of this feature request
Plugin enhancement for CoreDNS
2. What is the nature and description of the request?
https://coredns.io/plugins/rewrite/ . { rewrite { name com.extradomain.com NXDOMAIN } forward . 8.8.8.8 cache log } It might be possible to capture and return an NXDOMAIN on invalid requests made that match strings that are searched. When we make a lookup on an external domain for example (google.com) due to ndots 5 and the search string that injects the clustername as a search parameter, EVERY external request that is not fully qualified (no trailing dot) will be searched and will be shipped (as noise) to the upstream nameservers. At scale, this becomes a very large problem that can hammer nameservers. The Rewrite plugin might be able to ship an NXDOMAIN back immediately, skipping the lookup process and continuing the process before the actual FQDN is hanlded (see below for details).
3. Why does the customer need this? (List the business requirements here)
Cluster name is automatically injected and inserted by default and cannot be modified:
NDOTS is set to 5 and cannot be modified (globally - it can be modified at deployment level):
https://access.redhat.com/solutions/2518321
DNS search rules dictate that we will always search the internal auto-appended domain strings (namespace.svc.cluster.local, svc.cluster.local, cluster.local, <cluster-name>.<domain>.com, <any-other-additional-search-strings-added-to-host>
Therefore, an external url like "rmyapp.myexternaldomain.com" will always be searched first before it is sent upstream as an FQDN:
sh-4.4# cat /etc/resolv.conf search test-namespace.svc.cluster.local svc.cluster.local cluster.local myclustername.mydomain.com #<- automatically injected search strings at pod layer nameserver 172.30.0.10 #<-- internal nameserver listed as service IP for dns-default service in openshift-dns namespace options ndots:5 #<- injected ndot value sh-4.4# dig myapp.myexternaldomain.com +showsearch ##results below truncated for visibility on relevant parts: ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 42834 ##<-- search 1 (NXDOMAIN) handled internally at coredns pod ;myapp.myexternaldomain.com.test-namespace.svc.cluster.local. IN A ;; Query time: 1 msec ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 48415 ##<-- search 2 (NXDOMAIN) handled internally at coredns pod ;myapp.myexternaldomain.com.svc.cluster.local. IN A ;; Query time: 0 msec ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 26378 ##<-- search 3 (NXDOMAIN) handled internally at coredns pod ;myapp.myexternaldomain.com.cluster.local. IN A ;; Query time: 0 msec ;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 9070 ##<-- search 4 (NXDOMAIN) ;myapp.myexternaldomain.com.myclustername.mydomain.com. IN A ##<-- request + cluster domain (is shipped upstream to nameserver) ;; Query time: 1 msec ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 6027 ##<-- search 5 (NOERROR) - all search options exhausted, ship FQDN upstream (got result) ;myapp.myexternaldomain.com. IN A ##<-- observe the original query is shipped now that all search is exhausted, and trailing dot is appended as FQDN. ;; ANSWER SECTION: myapp.myexternaldomain.com. 5 IN A 10.0.0.101 ;; Query time: 0 msec
We have now made 2 requests to the nameserver:
- myapp.myexternaldomain.com.myclustername.mydomain.com (invalid/noise) - NXdomain returned.
- myapp.myexternaldomain.com. (fqdn, A-record returned)
Details here: https://access.redhat.com/articles/7068383#what-are-ndots-and-how-does-this-option-shape-lookup-behavior-2
We can currently modify the behavior at a deployment or application layer. However, at scale - this problem becomes VERY difficult to manage. Thousands of application teams, each with their own deployment/application stacks and flows, service mesh implementations. One of our customers is currently serving 14K NXDomain results per second that are 80% noise (appended auto-searched strings).
If we can disable (or drop/prevent) these strings from being searched entirely, we would save time and reduce load on the nameservers at scale. Current workarounds are suboptimal on clusters hitting this problem:
- Create a dns forwarder rule that filters this traffic (Risky, if the forwarder endpoints/service goes offline, we are now subject to the timeout waiting for a result on every call, added latency + overhead on performance
- Mitigate at every deployment on the cluster - enormously time-consuming and impactful as app dev teams need to scour code looking for URLS and re-train to push a trailing dot at the end of every external URL (anti-pattern for traditional platform deployments)
- Detection of which applications need to be modified is also challenging, as there are no built in tools that can easily identify which applications are making requests to external domains incorrectly (not forcing FQDN).
- Adopting a template change to reduce ndot size per deployment or remove search strings from all deployments can be hugely impactful if app teams are reliant on this functionality.
Implementing the Rewrite parameter as a modifiable option via dns operator to change the dns-default configmap would alleviate these concerns. It may be worth reviewing whether or not we automatically inject the option or not but this does have some considerations in case the customer is using a `com.<cluster>.<domain>.com` URL schema in their environment (rare but not impossible). Allowing for the option is more feasible/less impactful - assuming my understanding of the function is correct.
Tested FORWARD plugin option "EXCEPT" and observed this blocks the search flow because it returns a "SERVFAIL" response, so is not suitable - another plugin is needed.
This may also be feasible to be addressed with ACL plugin, but I am not certain how "filter" versus "drop" would work with existing config options and search rules, and seems more explicitly oriented towards source IP rulesets rather than global domain rules.
Rewrite plugin seems easiest to implement.
4. List any affected packages or components.
CoreDNS
Rewrite Plugin