-
Bug
-
Resolution: Done-Errata
-
Critical
-
4.13, 4.14, 4.15
-
+
-
Critical
-
No
-
1
-
Sprint 248
-
1
-
Rejected
-
False
-
-
-
Bug Fix
-
Done
-
-
-
-
This is a manual "clone" of issue OCPBUGS-27397. The following is the description of the original issue:
Description of problem:
After the update to OpenShift Container Platform 4.13, it was reported that the SRV query for _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net is failing. The query is sent to CoreDNS is not matching any configured forwardPlugin and therefore the default is applied. When revering the dns-default pod Image back to OpenShift Container Platform 4.12 it works and this is also the workaround that has been put in place as production application were affected. Testing shows that the problem is available in OpenShift Container Platform 4.13, 4.14 and even 4.15. Forcing TCP on pod level does not change the behavior and the query will still fail. But when configuring a specific forwardPlugin for the Domain and enforcing DNS over TCP it also works again. - Adjusting bufsize did/does not help as the result was still the same (suspecting this because of https://issues.redhat.com/browse/OCPBUGS-21901 - but again, as no effect) - Only way to make it work, is to force_tcp either in default ". /etc/resolv.conf" section or by configure a forwardPlugin and forcing TCP Checking upstream, I found https://github.com/coredns/coredns/issues/5953 respectively https://github.com/coredns/coredns/pull/6277 which I suspect being related. When building from master CoreDNS branch it indeed starts to work again and resolving the SRV entry is possible again. --- $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.13.27 True False 24h Cluster version is 4.13.27 $ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES dns-default-626td 2/2 Running 0 3m15s 10.128.2.49 aro-cluster-h78zv-h94mh-worker-eastus2-mlrxh <none> <none> dns-default-74nnw 2/2 Running 0 87s 10.131.0.47 aro-cluster-h78zv-h94mh-worker-eastus1-99l7n <none> <none> dns-default-8mggz 2/2 Running 0 2m31s 10.128.1.121 aro-cluster-h78zv-h94mh-master-0 <none> <none> dns-default-clgkg 2/2 Running 0 109s 10.129.2.187 aro-cluster-h78zv-h94mh-worker-eastus3-jhvff <none> <none> dns-default-htdw2 2/2 Running 0 2m10s 10.129.0.43 aro-cluster-h78zv-h94mh-master-2 <none> <none> dns-default-wprln 2/2 Running 0 2m52s 10.130.1.70 aro-cluster-h78zv-h94mh-master-1 <none> <none> node-resolver-4dmgj 1/1 Running 0 17h 10.0.2.4 aro-cluster-h78zv-h94mh-worker-eastus3-jhvff <none> <none> node-resolver-5c6tj 1/1 Running 0 17h 10.0.0.10 aro-cluster-h78zv-h94mh-master-0 <none> <none> node-resolver-chfr6 1/1 Running 0 17h 10.0.0.7 aro-cluster-h78zv-h94mh-master-2 <none> <none> node-resolver-mnhsp 1/1 Running 0 17h 10.0.2.6 aro-cluster-h78zv-h94mh-worker-eastus2-mlrxh <none> <none> node-resolver-snxsb 1/1 Running 0 17h 10.0.0.9 aro-cluster-h78zv-h94mh-master-1 <none> <none> node-resolver-sp7h8 1/1 Running 0 17h 10.0.2.5 aro-cluster-h78zv-h94mh-worker-eastus1-99l7n <none> <none> $ oc get pod -o wide -n project-100 NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES tools-54f4d6844b-lr6z9 1/1 Running 0 17h 10.131.0.40 aro-cluster-h78zv-h94mh-worker-eastus1-99l7n <none> <none> $ oc get dns.operator default -o yaml apiVersion: operator.openshift.io/v1 kind: DNS metadata: creationTimestamp: "2024-01-11T09:14:03Z" finalizers: - dns.operator.openshift.io/dns-controller generation: 4 name: default resourceVersion: "4216641" uid: c8f5c627-2010-4c4a-a5fe-ed87f320e427 spec: logLevel: Normal nodePlacement: {} operatorLogLevel: Normal servers: - forwardPlugin: policy: Random protocolStrategy: "" upstreams: - 10.0.0.9 name: example zones: - example.xyz upstreamResolvers: policy: Sequential transportConfig: {} upstreams: - port: 53 type: SystemResolvConf status: clusterDomain: cluster.local clusterIP: 172.30.0.10 conditions: - lastTransitionTime: "2024-01-19T07:54:18Z" message: Enough DNS pods are available, and the DNS service has a cluster IP address. reason: AsExpected status: "False" type: Degraded - lastTransitionTime: "2024-01-19T07:55:02Z" message: All DNS and node-resolver pods are available, and the DNS service has a cluster IP address. reason: AsExpected status: "False" type: Progressing - lastTransitionTime: "2024-01-18T13:29:59Z" message: The DNS daemonset has available pods, and the DNS service has a cluster IP address. reason: AsExpected status: "True" type: Available - lastTransitionTime: "2024-01-11T09:14:04Z" message: DNS Operator can be upgraded reason: AsExpected status: "True" type: Upgradeable $ oc rsh -n project-100 tools-54f4d6844b-lr6z9 sh-4.4$ host -t srv _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net Host _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net not found: 2(SERVFAIL) $ oc logs dns-default-74nnw Defaulted container "dns" out of: dns, kube-rbac-proxy .:5353 hostname.bind.:5353 example.xyz.:5353 [INFO] plugin/reload: Running configuration SHA512 = 88c7c194d29d0a23b322aeee1eaa654ef385e6bd1affae3715028aba1d33cc8340e33184ba183f87e6c66a2014261c3e02edaea8e42ad01ec6a7c5edb34dfc6a CoreDNS-1.10.1 linux/amd64, go1.19.13 X:strictfipsruntime, [INFO] 10.131.0.40:39333 - 54228 "SRV IN _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. udp 76 false 512" - - 0 5.001868103s [ERROR] plugin/errors: 2 _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. SRV: dns: overflowing header size [INFO] 10.131.0.40:39333 - 54228 "SRV IN _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. udp 76 false 512" - - 0 5.003223099s [ERROR] plugin/errors: 2 _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. SRV: dns: overflowing header size --- https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/4.12.47/release.txt - using quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:3c0de49c0e76f2ee23a107fc9397f2fd32e7a6a8a458906afd6df04ff5bb0f7b $ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES dns-default-8vrwd 2/2 Running 0 6m22s 10.129.0.45 aro-cluster-h78zv-h94mh-master-2 <none> <none> dns-default-fm59d 2/2 Running 0 7m4s 10.129.2.190 aro-cluster-h78zv-h94mh-worker-eastus3-jhvff <none> <none> dns-default-grtqs 2/2 Running 0 7m48s 10.130.1.73 aro-cluster-h78zv-h94mh-master-1 <none> <none> dns-default-l8mp2 2/2 Running 0 6m43s 10.131.0.49 aro-cluster-h78zv-h94mh-worker-eastus1-99l7n <none> <none> dns-default-slc4n 2/2 Running 0 8m11s 10.128.1.126 aro-cluster-h78zv-h94mh-master-0 <none> <none> dns-default-xgr7c 2/2 Running 0 7m25s 10.128.2.51 aro-cluster-h78zv-h94mh-worker-eastus2-mlrxh <none> <none> node-resolver-2nmpx 1/1 Running 0 10m 10.0.2.4 aro-cluster-h78zv-h94mh-worker-eastus3-jhvff <none> <none> node-resolver-689j7 1/1 Running 0 10m 10.0.2.5 aro-cluster-h78zv-h94mh-worker-eastus1-99l7n <none> <none> node-resolver-8qhls 1/1 Running 0 10m 10.0.0.7 aro-cluster-h78zv-h94mh-master-2 <none> <none> node-resolver-nv8mq 1/1 Running 0 10m 10.0.2.6 aro-cluster-h78zv-h94mh-worker-eastus2-mlrxh <none> <none> node-resolver-r52v7 1/1 Running 0 10m 10.0.0.10 aro-cluster-h78zv-h94mh-master-0 <none> <none> node-resolver-z8d4n 1/1 Running 0 10m 10.0.0.9 aro-cluster-h78zv-h94mh-master-1 <none> <none> $ oc get pod -n project-100 -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES tools-54f4d6844b-lr6z9 1/1 Running 0 18h 10.131.0.40 aro-cluster-h78zv-h94mh-worker-eastus1-99l7n <none> <none> $ oc rsh -n project-100 tools-54f4d6844b-lr6z9 sh-4.4$ host -t srv _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1032 x1-9-foobar.bla.example.net. _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1039 x1-9-foobar.bla.example.net. _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1043 x1-9-foobar.bla.example.net. _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1048 x1-9-foobar.bla.example.net. _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1049 x1-9-foobar.bla.example.net. _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1050 x1-9-foobar.bla.example.net. --- https://mirror.openshift.com/pub/openshift-v4/x86_64/clients/ocp/4.15.0-rc.2/release.txt - using quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:9e8ffba7854f3f02e8940ddcb2636ceb4773db77872ff639a447c4bab3a69ecc $ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES dns-default-gcs7s 2/2 Running 0 5m 10.128.2.52 aro-cluster-h78zv-h94mh-worker-eastus2-mlrxh <none> <none> dns-default-mnbh4 2/2 Running 0 4m37s 10.129.0.46 aro-cluster-h78zv-h94mh-master-2 <none> <none> dns-default-p2s6v 2/2 Running 0 3m55s 10.130.1.77 aro-cluster-h78zv-h94mh-master-1 <none> <none> dns-default-svccn 2/2 Running 0 3m13s 10.128.1.128 aro-cluster-h78zv-h94mh-master-0 <none> <none> dns-default-tgktg 2/2 Running 0 3m34s 10.131.0.50 aro-cluster-h78zv-h94mh-worker-eastus1-99l7n <none> <none> dns-default-xd5vq 2/2 Running 0 4m16s 10.129.2.191 aro-cluster-h78zv-h94mh-worker-eastus3-jhvff <none> <none> node-resolver-2nmpx 1/1 Running 0 18m 10.0.2.4 aro-cluster-h78zv-h94mh-worker-eastus3-jhvff <none> <none> node-resolver-689j7 1/1 Running 0 18m 10.0.2.5 aro-cluster-h78zv-h94mh-worker-eastus1-99l7n <none> <none> node-resolver-8qhls 1/1 Running 0 18m 10.0.0.7 aro-cluster-h78zv-h94mh-master-2 <none> <none> node-resolver-nv8mq 1/1 Running 0 18m 10.0.2.6 aro-cluster-h78zv-h94mh-worker-eastus2-mlrxh <none> <none> node-resolver-r52v7 1/1 Running 0 18m 10.0.0.10 aro-cluster-h78zv-h94mh-master-0 <none> <none> node-resolver-z8d4n 1/1 Running 0 18m 10.0.0.9 aro-cluster-h78zv-h94mh-master-1 <none> <none> $ oc get pod -n project-100 -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES tools-54f4d6844b-lr6z9 1/1 Running 0 18h 10.131.0.40 aro-cluster-h78zv-h94mh-worker-eastus1-99l7n <none> <none> $ oc rsh -n project-100 tools-54f4d6844b-lr6z9 sh-4.4$ host -t srv _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net Host _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net not found: 2(SERVFAIL) $ oc logs dns-default-tgktg Defaulted container "dns" out of: dns, kube-rbac-proxy .:5353 hostname.bind.:5353 example.net.:5353 [INFO] plugin/reload: Running configuration SHA512 = 8efa6675505d17551d17ca1e2ca45506a731dbab1f53dd687d37cb98dbaf4987a90622b6b030fe1643ba2cd17198a813ba9302b84ad729de4848f8998e768605 CoreDNS-1.11.1 linux/amd64, go1.20.10 X:strictfipsruntime, [INFO] 10.131.0.40:35246 - 61734 "SRV IN _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. udp 76 false 512" - - 0 5.003577431s [ERROR] plugin/errors: 2 _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. SRV: dns: overflowing header size [INFO] 10.131.0.40:35246 - 61734 "SRV IN _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. udp 76 false 512" - - 0 5.000969251s [ERROR] plugin/errors: 2 _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. SRV: dns: overflowing header size --- quay.io/rhn_support_sreber/coredns:latest - based on https://github.com/coredns/coredns master branch build on January 19th 2024 (suspecting https://github.com/coredns/coredns/pull/6277 to be the fix) $ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES dns-default-bpjpn 2/2 Running 0 2m22s 10.130.1.78 aro-cluster-h78zv-h94mh-master-1 <none> <none> dns-default-c7wcz 2/2 Running 0 99s 10.131.0.51 aro-cluster-h78zv-h94mh-worker-eastus1-99l7n <none> <none> dns-default-d7qjz 2/2 Running 0 3m6s 10.129.2.193 aro-cluster-h78zv-h94mh-worker-eastus3-jhvff <none> <none> dns-default-dkvtp 2/2 Running 0 78s 10.128.1.131 aro-cluster-h78zv-h94mh-master-0 <none> <none> dns-default-t6sv7 2/2 Running 0 2m44s 10.129.0.47 aro-cluster-h78zv-h94mh-master-2 <none> <none> dns-default-vf9f6 2/2 Running 0 2m 10.128.2.53 aro-cluster-h78zv-h94mh-worker-eastus2-mlrxh <none> <none> node-resolver-2nmpx 1/1 Running 0 24m 10.0.2.4 aro-cluster-h78zv-h94mh-worker-eastus3-jhvff <none> <none> node-resolver-689j7 1/1 Running 0 24m 10.0.2.5 aro-cluster-h78zv-h94mh-worker-eastus1-99l7n <none> <none> node-resolver-8qhls 1/1 Running 0 24m 10.0.0.7 aro-cluster-h78zv-h94mh-master-2 <none> <none> node-resolver-nv8mq 1/1 Running 0 24m 10.0.2.6 aro-cluster-h78zv-h94mh-worker-eastus2-mlrxh <none> <none> node-resolver-r52v7 1/1 Running 0 24m 10.0.0.10 aro-cluster-h78zv-h94mh-master-0 <none> <none> node-resolver-z8d4n 1/1 Running 0 24m 10.0.0.9 aro-cluster-h78zv-h94mh-master-1 <none> <none> $ oc get pod -n project-100 -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES tools-54f4d6844b-lr6z9 1/1 Running 0 18h 10.131.0.40 aro-cluster-h78zv-h94mh-worker-eastus1-99l7n <none> <none> $ oc rsh -n project-100 tools-54f4d6844b-lr6z9 sh-4.4$ host -t srv _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1032 x1-9-foobar.bla.example.net. _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1039 x1-9-foobar.bla.example.net. _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1043 x1-9-foobar.bla.example.net. _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1048 x1-9-foobar.bla.example.net. _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1049 x1-9-foobar.bla.example.net. _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1050 x1-9-foobar.bla.example.net. --- Back wth OpenShift Container Platform 4.13.27 but adjusting `CoreDNS` configuration. Defining specific forwardPlugin and enforcing TCP $ oc get dns.operator default -o yaml apiVersion: operator.openshift.io/v1 kind: DNS metadata: creationTimestamp: "2024-01-11T09:14:03Z" finalizers: - dns.operator.openshift.io/dns-controller generation: 7 name: default resourceVersion: "4230436" uid: c8f5c627-2010-4c4a-a5fe-ed87f320e427 spec: logLevel: Normal nodePlacement: {} operatorLogLevel: Normal servers: - forwardPlugin: policy: Random protocolStrategy: TCP upstreams: - 10.0.0.9 name: example zones: - example.net upstreamResolvers: policy: Sequential transportConfig: {} upstreams: - port: 53 type: SystemResolvConf status: clusterDomain: cluster.local clusterIP: 172.30.0.10 conditions: - lastTransitionTime: "2024-01-19T08:27:21Z" message: Enough DNS pods are available, and the DNS service has a cluster IP address. reason: AsExpected status: "False" type: Degraded - lastTransitionTime: "2024-01-19T08:28:03Z" message: All DNS and node-resolver pods are available, and the DNS service has a cluster IP address. reason: AsExpected status: "False" type: Progressing - lastTransitionTime: "2024-01-19T08:00:02Z" message: The DNS daemonset has available pods, and the DNS service has a cluster IP address. reason: AsExpected status: "True" type: Available - lastTransitionTime: "2024-01-11T09:14:04Z" message: DNS Operator can be upgraded reason: AsExpected status: "True" type: Upgradeable $ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES dns-default-frdkm 2/2 Running 0 3m5s 10.131.0.52 aro-cluster-h78zv-h94mh-worker-eastus1-99l7n <none> <none> dns-default-jsfkb 2/2 Running 0 99s 10.129.0.49 aro-cluster-h78zv-h94mh-master-2 <none> <none> dns-default-jzzqc 2/2 Running 0 2m21s 10.128.2.54 aro-cluster-h78zv-h94mh-worker-eastus2-mlrxh <none> <none> dns-default-sgf4h 2/2 Running 0 2m 10.130.1.79 aro-cluster-h78zv-h94mh-master-1 <none> <none> dns-default-t8nn7 2/2 Running 0 2m44s 10.129.2.194 aro-cluster-h78zv-h94mh-worker-eastus3-jhvff <none> <none> dns-default-xmvqg 2/2 Running 0 3m27s 10.128.1.133 aro-cluster-h78zv-h94mh-master-0 <none> <none> node-resolver-2nmpx 1/1 Running 0 29m 10.0.2.4 aro-cluster-h78zv-h94mh-worker-eastus3-jhvff <none> <none> node-resolver-689j7 1/1 Running 0 29m 10.0.2.5 aro-cluster-h78zv-h94mh-worker-eastus1-99l7n <none> <none> node-resolver-8qhls 1/1 Running 0 29m 10.0.0.7 aro-cluster-h78zv-h94mh-master-2 <none> <none> node-resolver-nv8mq 1/1 Running 0 29m 10.0.2.6 aro-cluster-h78zv-h94mh-worker-eastus2-mlrxh <none> <none> node-resolver-r52v7 1/1 Running 0 29m 10.0.0.10 aro-cluster-h78zv-h94mh-master-0 <none> <none> node-resolver-z8d4n 1/1 Running 0 29m 10.0.0.9 aro-cluster-h78zv-h94mh-master-1 <none> <none> $ oc get pod -n project-100 -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES tools-54f4d6844b-lr6z9 1/1 Running 0 18h 10.131.0.40 aro-cluster-h78zv-h94mh-worker-eastus1-99l7n <none> <none> $ oc rsh -n project-100 tools-54f4d6844b-lr6z9 sh-4.4$ host -t srv _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1032 x1-9-foobar.bla.example.net. _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1039 x1-9-foobar.bla.example.net. _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1043 x1-9-foobar.bla.example.net. _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1048 x1-9-foobar.bla.example.net. _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1049 x1-9-foobar.bla.example.net. _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net has SRV record 0 0 1050 x1-9-foobar.bla.example.net. --- Back wth OpenShift Container Platform 4.13.27 but now, forcing TCP on pod level $ oc get deployment tools -n project-100 -o yaml apiVersion: apps/v1 kind: Deployment metadata: annotations: alpha.image.policy.openshift.io/resolve-names: '*' app.openshift.io/route-disabled: "false" deployment.kubernetes.io/revision: "5" image.openshift.io/triggers: '[{"from":{"kind":"ImageStreamTag","name":"tools:latest","namespace":"project-100"},"fieldPath":"spec.template.spec.containers[?(@.name==\"tools\")].image","pause":"false"}]' openshift.io/generated-by: OpenShiftWebConsole creationTimestamp: "2024-01-17T11:22:05Z" generation: 5 labels: app: tools app.kubernetes.io/component: tools app.kubernetes.io/instance: tools app.kubernetes.io/name: tools app.kubernetes.io/part-of: tools app.openshift.io/runtime: other-linux app.openshift.io/runtime-namespace: project-100 name: tools namespace: project-100 resourceVersion: "4232839" uid: a8157243-71e1-4597-9aa5-497afed5f722 spec: progressDeadlineSeconds: 600 replicas: 1 revisionHistoryLimit: 10 selector: matchLabels: app: tools strategy: rollingUpdate: maxSurge: 25% maxUnavailable: 25% type: RollingUpdate template: metadata: annotations: openshift.io/generated-by: OpenShiftWebConsole creationTimestamp: null labels: app: tools deployment: tools spec: containers: - command: - /bin/bash - -c - while true; do sleep 1;done image: image-registry.openshift-image-registry.svc:5000/project-100/tools@sha256:fba289d2ff20df2bfe38aa58fa3e491bbecf09e90e96b3c9b8c38f786dc2efb8 imagePullPolicy: Always name: tools ports: - containerPort: 8080 protocol: TCP resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsConfig: options: - name: use-vc dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 status: availableReplicas: 1 conditions: - lastTransitionTime: "2024-01-17T11:23:56Z" lastUpdateTime: "2024-01-17T11:23:56Z" message: Deployment has minimum availability. reason: MinimumReplicasAvailable status: "True" type: Available - lastTransitionTime: "2024-01-17T11:22:05Z" lastUpdateTime: "2024-01-19T08:33:28Z" message: ReplicaSet "tools-6749b4cf47" has successfully progressed. reason: NewReplicaSetAvailable status: "True" type: Progressing observedGeneration: 5 readyReplicas: 1 replicas: 1 updatedReplicas: 1 $ oc get pod -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES dns-default-7kfzh 2/2 Running 0 2m25s 10.129.2.196 aro-cluster-h78zv-h94mh-worker-eastus3-jhvff <none> <none> dns-default-g4mtd 2/2 Running 0 2m25s 10.128.2.55 aro-cluster-h78zv-h94mh-worker-eastus2-mlrxh <none> <none> dns-default-l4xkg 2/2 Running 0 2m26s 10.129.0.50 aro-cluster-h78zv-h94mh-master-2 <none> <none> dns-default-l7rq8 2/2 Running 0 2m25s 10.128.1.135 aro-cluster-h78zv-h94mh-master-0 <none> <none> dns-default-lt6zx 2/2 Running 0 2m26s 10.131.0.53 aro-cluster-h78zv-h94mh-worker-eastus1-99l7n <none> <none> dns-default-t6bzl 2/2 Running 0 2m25s 10.130.1.82 aro-cluster-h78zv-h94mh-master-1 <none> <none> node-resolver-279mf 1/1 Running 0 2m24s 10.0.2.6 aro-cluster-h78zv-h94mh-worker-eastus2-mlrxh <none> <none> node-resolver-2bzfc 1/1 Running 0 2m24s 10.0.2.4 aro-cluster-h78zv-h94mh-worker-eastus3-jhvff <none> <none> node-resolver-bdz4m 1/1 Running 0 2m24s 10.0.0.7 aro-cluster-h78zv-h94mh-master-2 <none> <none> node-resolver-jrv2w 1/1 Running 0 2m24s 10.0.0.9 aro-cluster-h78zv-h94mh-master-1 <none> <none> node-resolver-lbfg5 1/1 Running 0 2m23s 10.0.0.10 aro-cluster-h78zv-h94mh-master-0 <none> <none> node-resolver-qnm92 1/1 Running 0 2m24s 10.0.2.5 aro-cluster-h78zv-h94mh-worker-eastus1-99l7n <none> <none> $ oc get pod -n project-100 -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES tools-6749b4cf47-gmw9v 1/1 Running 0 50s 10.131.0.54 aro-cluster-h78zv-h94mh-worker-eastus1-99l7n <none> <none> $ oc rsh -n project-100 tools-6749b4cf47-gmw9v sh-4.4$ cat /etc/resolv.conf search project-100.svc.cluster.local svc.cluster.local cluster.local khrmlwa2zp4e1oisi1qjtoxwrc.bx.internal.cloudapp.net nameserver 172.30.0.10 options ndots:5 use-vc sh-4.4$ host -t srv _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net Host _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net not found: 2(SERVFAIL) $ oc logs dns-default-lt6zx Defaulted container "dns" out of: dns, kube-rbac-proxy .:5353 hostname.bind.:5353 example.xyz.:5353 [INFO] plugin/reload: Running configuration SHA512 = 79d17b9fc0f61d2c6db13a0f7f3d0a873c4d86ab5cba90c3819a5b57a48fac2ef0fb644b55e959984cd51377bff0db04f399a341a584c466e540a0d7501340f7 CoreDNS-1.10.1 linux/amd64, go1.19.13 X:strictfipsruntime, [INFO] 10.131.0.40:51367 - 22867 "SRV IN _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. udp 76 false 512" - - 0 5.00024781s [ERROR] plugin/errors: 2 _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. SRV: dns: overflowing header size [INFO] 10.131.0.40:51367 - 22867 "SRV IN _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. udp 76 false 512" - - 0 5.00096551s [ERROR] plugin/errors: 2 _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. SRV: dns: overflowing header size [INFO] 10.131.0.54:44935 - 3087 "SRV IN _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. udp 76 false 512" - - 0 5.000619524s [ERROR] plugin/errors: 2 _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. SRV: dns: overflowing header size [INFO] 10.131.0.54:44935 - 3087 "SRV IN _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. udp 76 false 512" - - 0 5.000369584s [ERROR] plugin/errors: 2 _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. SRV: dns: overflowing header size
Version-Release number of selected component (if applicable):
OpenShift Container Platform 4.13, 4.14, 4.15
How reproducible:
Always
Steps to Reproduce:
1. Run "host -t srv _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net" inside a pod
Actual results:
dns-default pod is reporting below error when running the query. [INFO] 10.131.0.40:39333 - 54228 "SRV IN _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. udp 76 false 512" - - 0 5.001868103s [ERROR] plugin/errors: 2 _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. SRV: dns: overflowing header size [INFO] 10.131.0.40:39333 - 54228 "SRV IN _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. udp 76 false 512" - - 0 5.003223099s [ERROR] plugin/errors: 2 _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net. SRV: dns: overflowing header size And the command "host -t srv _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net" will fail. sh-4.4$ host -t srv _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net Host _example._tcp.foo-bar-abc-123-xyz-456-foo-000.abcde.example.net not found: 2(SERVFAIL)
Expected results:
No error reported in dns-default pod and query to actually return expected result
Additional info:
I suspect https://github.com/coredns/coredns/issues/5953 respectively https://github.com/coredns/coredns/pull/6277 being related. Hence built CoreDNS from master branch and created quay.io/rhn_support_sreber/coredns:latest. When running that Image in dns-default pod resolving the host query works again.
- blocks
-
OCPBUGS-28200 SRV lookup is failing after OpenShift Container Platform 4.13 update because of CoreDNS version 1.10.1
- Closed
- is cloned by
-
OCPBUGS-28200 SRV lookup is failing after OpenShift Container Platform 4.13 update because of CoreDNS version 1.10.1
- Closed
- links to
-
RHSA-2023:7198 OpenShift Container Platform 4.15 security update