Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-15251

while/after upgrading to OKD 4.11 2023-01-14 CoreDNS has a problem with UDP overflows

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Major Major
    • 4.12.0
    • 4.11
    • Networking / DNS
    • None
    • +
    • Important
    • No
    • 2
    • Sprint 239, Sprint 240, Sprint 241
    • 3
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, a non-compliant upstream DNS server that provided a UDP response larger than {product-title} specified bufsize of 512 bytes, caused an overflow error in CoreDNS in which a response to a DNS query was not given. With this update, users can configure the `protocolStrategy` field on the `dnses.operator.openshift.io` custom resource to be "TCP". This resolves issues with non-compliant upstream DNS servers. (link:https://issues.redhat.com/browse/OCPBUGS-15251[OCPBUGS-15251])

      Before this update, a non-compliant upstream DNS server, providing a UDP response larger than OpenShift's specified bufsize (512 bytes), caused CoreDNS to throw an overflow error and not provide a response to a DNS query. With this update, users can now configure the protocolStrategy field on the dnses.operator.openshift.io CR to be "TCP". By setting this field to TCP, CoreDNS will use TCP for upstream requests, working around UDP overflow issues with non-compliant upstream DNS servers.
      Show
      * Previously, a non-compliant upstream DNS server that provided a UDP response larger than {product-title} specified bufsize of 512 bytes, caused an overflow error in CoreDNS in which a response to a DNS query was not given. With this update, users can configure the `protocolStrategy` field on the `dnses.operator.openshift.io` custom resource to be "TCP". This resolves issues with non-compliant upstream DNS servers. (link: https://issues.redhat.com/browse/OCPBUGS-15251 [ OCPBUGS-15251 ]) Before this update, a non-compliant upstream DNS server, providing a UDP response larger than OpenShift's specified bufsize (512 bytes), caused CoreDNS to throw an overflow error and not provide a response to a DNS query. With this update, users can now configure the protocolStrategy field on the dnses.operator.openshift.io CR to be "TCP". By setting this field to TCP, CoreDNS will use TCP for upstream requests, working around UDP overflow issues with non-compliant upstream DNS servers.
    • Bug Fix
    • Hide
      See thread about this in openshift-users Slack channel.
      Vadim Rutkovsky advised me to open this issue here.
      Show
      See thread about this in openshift-users Slack channel. Vadim Rutkovsky advised me to open this issue here.

      This is a clone of issue OCPBUGS-6829. The following is the description of the original issue:

      Description of problem:

      While/after upgrading to 4.11 2023-01-14 CoreDNS has a problem with UDP overflows so DNS lookups are very slow and cause the ingress operator upgrade to stall. We needed to work around with force_tcp following this: https://access.redhat.com/solutions/5984291

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      100%, but seems to depend on the network environemnt (excact cause unknown)

      Steps to Reproduce:

      1. install cluster with OKD 4.11-2022-12-02 or earlier
      2. initiate upgrade to OKD 4.11-2023-01-14
      3. upgrade will stall after upgrading CoreDNS
      

      Actual results:

      CoreDNS logs: [ERROR] plugin/errors: 2 oauth-openshift.apps.okd-admin.muc.lv1871.de. AAAA: dns: overflowing header size 

      Expected results:

       

      Additional info:

       

              gspence@redhat.com Grant Spence
              openshift-crt-jira-prow OpenShift Prow Bot
              Melvin Joseph Melvin Joseph
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: