Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-5695

CoreDNS operator - make lameduck interval configurable

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • Network Edge

      1. Proposed title of this feature request

      CoreDNS operator - make lameduck interval configurable

      2. What is the nature and description of the request?

      Currently, the lameduck interval is hardcoded (and always was) to 20 seconds. This means that openshift-dns pods with a terminationGracePeriodSeconds will take a minimum of
      20 seconds to stop, and a maximum of 30 seconds.

      This in turn delays the deactivation of crio.service during graceful node shutdown, as crio.service will wait for all containers to shut down gracefully or to hit terminationGracePeriodSeconds to kill them forcefully.

      Therefore, it is currently impossible to reboot an OpenShift node gracefully under 20+ seconds. There may be other containers that also delay shutdown, but the absolute minimum bar is set by the openshift-dns/dns-default pods, at 20 seconds, due to the hardcoded lameduck interval.

      # oc get cm -n openshift-dns dns-default -o yaml
      apiVersion: v1
      data:
        Corefile: |
          .:5353 {
              bufsize 1232
              errors
              log . {
                  class error
              }
              health {
                  lameduck 20s
              }
              ready
              kubernetes cluster.local in-addr.arpa ip6.arpa {
                  pods insecure
                  fallthrough in-addr.arpa ip6.arpa
              }
              prometheus 127.0.0.1:9153
              forward . /etc/resolv.conf {
                  policy sequential
              }
              cache 900 {
                  denial 9984 30
              }
              reload
          }
          hostname.bind:5353 {
              chaos
          }
      kind: ConfigMap
      metadata:
        name: dns-default
        namespace: openshift-dns
      

      https://coredns.io/plugins/health/

      Where lameduck will delay shutdown for DURATION. /health will still answer 200 OK. Note: The ready plugin will not answer OK while CoreDNS is in lame duck mode prior to shutdown.

      The lameduckDuration is and always has been hardcoded to 20 seconds:
      https://github.com/openshift/cluster-dns-operator/blob/release-4.14/pkg/operator/controller/controller_dns_configmap.go#L27

      3. Why does the customer need this? (List the business requirements here)

      The customer is interested in speeding up node shut down. In this specific example, they use OpenStack to shut down their OCP VMs gracefully and want to keep the shutdown under 30 seconds.
      Shutdown of the openshift-dns pod alone takes a minimum of 20 seconds. 20 seconds is IMO a long time to shut down a process, especially given that it's blocking node shutdown.
      There may be other containers that also slow this down (which of course must be optimized), but openshift-dns is reliably at the hardcoded 20 seconds minimum and it cannot be tweaked.
      Throw in some OpenStack API overhead plus a few seconds for shutting down the rest of the system, and you can see that it's easy to be > 30 seconds for the entire operation.

      Also, what's the reason for a lameduck setting of 20s? Why not 18, 19, 21?
      Historically, the 20 seconds come from https://bugzilla.redhat.com/show_bug.cgi?id=1884053
      Originally, that bug was supposed to be solved with a 60 second lameduck period: https://github.com/openshift/cluster-dns-operator/pull/205 (there's a discussion there how we got to 60 seconds for the original PR "for good measure")
      That was then reverted and replaced with the solution from https://github.com/openshift/cluster-dns-operator/pull/237 which is where a lameduck of 20 seconds first appears, but I can't find a justification for exactly 20 seconds.
      From a glance at the BZ and PRs (though I haven't read them thoroughly), it might have to do with having it be a multiple of some of the probes? But then, could we make the probes configurable, or set them to a fraction of lameduck? (or vice versa, lameduck a multiple of the probes)

      By the way, we also used to have https://issues.redhat.com/browse/RFE-1797 which asked for a customizable configuration of the CoreDNS operator, but its scope was greatly reduced to only deal with upstreamResolver configuration.

      4. List any affected packages or components.
      coredns, node reboot

              mcurry@redhat.com Marc Curry
              akaris@redhat.com Andreas Karis
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: