Loading...

Type: Feature Request
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: Network Edge
Labels:
- dns

PX Impact Score:
PX Priority Data:

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

1. Proposed title of this feature request

CoreDNS operator - make lameduck interval configurable

2. What is the nature and description of the request?

Currently, the lameduck interval is hardcoded (and always was) to 20 seconds. This means that openshift-dns pods with a terminationGracePeriodSeconds will take a minimum of
20 seconds to stop, and a maximum of 30 seconds.

This in turn delays the deactivation of crio.service during graceful node shutdown, as crio.service will wait for all containers to shut down gracefully or to hit terminationGracePeriodSeconds to kill them forcefully.

Therefore, it is currently impossible to reboot an OpenShift node gracefully under 20+ seconds. There may be other containers that also delay shutdown, but the absolute minimum bar is set by the openshift-dns/dns-default pods, at 20 seconds, due to the hardcoded lameduck interval.

# oc get cm -n openshift-dns dns-default -o yaml
apiVersion: v1
data:
  Corefile: |
    .:5353 {
        bufsize 1232
        errors
        log . {
            class error
        }
        health {
            lameduck 20s
        }
        ready
        kubernetes cluster.local in-addr.arpa ip6.arpa {
            pods insecure
            fallthrough in-addr.arpa ip6.arpa
        }
        prometheus 127.0.0.1:9153
        forward . /etc/resolv.conf {
            policy sequential
        }
        cache 900 {
            denial 9984 30
        }
        reload
    }
    hostname.bind:5353 {
        chaos
    }
kind: ConfigMap
metadata:
  name: dns-default
  namespace: openshift-dns

https://coredns.io/plugins/health/

Where lameduck will delay shutdown for DURATION. /health will still answer 200 OK. Note: The ready plugin will not answer OK while CoreDNS is in lame duck mode prior to shutdown.

The lameduckDuration is and always has been hardcoded to 20 seconds:
https://github.com/openshift/cluster-dns-operator/blob/release-4.14/pkg/operator/controller/controller_dns_configmap.go#L27

3. Why does the customer need this? (List the business requirements here)

The customer is interested in speeding up node shut down. In this specific example, they use OpenStack to shut down their OCP VMs gracefully and want to keep the shutdown under 30 seconds.
Shutdown of the openshift-dns pod alone takes a minimum of 20 seconds. 20 seconds is IMO a long time to shut down a process, especially given that it's blocking node shutdown.
There may be other containers that also slow this down (which of course must be optimized), but openshift-dns is reliably at the hardcoded 20 seconds minimum and it cannot be tweaked.
Throw in some OpenStack API overhead plus a few seconds for shutting down the rest of the system, and you can see that it's easy to be > 30 seconds for the entire operation.

Also, what's the reason for a lameduck setting of 20s? Why not 18, 19, 21?
Historically, the 20 seconds come from https://bugzilla.redhat.com/show_bug.cgi?id=1884053
Originally, that bug was supposed to be solved with a 60 second lameduck period: https://github.com/openshift/cluster-dns-operator/pull/205 (there's a discussion there how we got to 60 seconds for the original PR "for good measure")
That was then reverted and replaced with the solution from https://github.com/openshift/cluster-dns-operator/pull/237 which is where a lameduck of 20 seconds first appears, but I can't find a justification for exactly 20 seconds.
From a glance at the BZ and PRs (though I haven't read them thoroughly), it might have to do with having it be a multiple of some of the probes? But then, could we make the probes configurable, or set them to a fraction of lameduck? (or vice versa, lameduck a multiple of the probes)

By the way, we also used to have https://issues.redhat.com/browse/RFE-1797 which asked for a customizable configuration of the CoreDNS operator, but its scope was greatly reduced to only deal with upstreamResolver configuration.

4. List any affected packages or components.
coredns, node reboot

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates