Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-19452

DaemonSet fails to scale down during the rolling update when maxUnavailable=0

    XMLWordPrintable

Details

    • Bug
    • Resolution: Done-Errata
    • Critical
    • 4.15.0
    • 4.13
    • None
    • Important
    • No
    • Approved
    • False
    • Hide

      None

      Show
      None
    • Hide
      ---------- edited for release notes ----------
      * Previously, when the `maxSurge` field is set for a daemon set and the toleration is updated, pods fail to scale down which can result in a failed rollout due to a different set of nodes being used for scheduling. With this release, nodes are properly excluded if scheduling constraints are not met, and rollouts can complete successfully. (link:https://issues.redhat.com/browse/OCPBUGS-19452[*OCPBUGS-19452*])
      ---------- original text ----------
      Cause: DaemonSet with maxSurge fails to scale down pods when a toleration are updated, resulting in a different set of nodes being used for scheduling.
      Consequence: Incorrect scheduling of DaemonSet of pods which can result in a failed rollout.
      Fix: Revised the logic for DaemonSet rolling update to exclude nodes if scheduling constraints are not met.
      Result: This eliminates the problem of rolling updates to a DaemonSet getting stuck around tolerations.
      Show
      ---------- edited for release notes ---------- * Previously, when the `maxSurge` field is set for a daemon set and the toleration is updated, pods fail to scale down which can result in a failed rollout due to a different set of nodes being used for scheduling. With this release, nodes are properly excluded if scheduling constraints are not met, and rollouts can complete successfully. (link: https://issues.redhat.com/browse/OCPBUGS-19452 [* OCPBUGS-19452 *]) ---------- original text ---------- Cause: DaemonSet with maxSurge fails to scale down pods when a toleration are updated, resulting in a different set of nodes being used for scheduling. Consequence: Incorrect scheduling of DaemonSet of pods which can result in a failed rollout. Fix: Revised the logic for DaemonSet rolling update to exclude nodes if scheduling constraints are not met. Result: This eliminates the problem of rolling updates to a DaemonSet getting stuck around tolerations.
    • Bug Fix
    • Done

    Description

      Description of problem:

      The OpenShift DNS daemonset has the rolling update strategy. The "maxSurge" parameter is set to a non zero value which means that the "maxUnavailable" parameter is set to zero. When the user replaces the toleration in the daemonset's template spec (via the OpenShift DNS config API) from the one which helps to be scheduled on the master node into any other toleration: the new pods are still trying to be scheduled on the master nodes. The old pods from the tolerated nodes can be lucky enough to be recreated but only if they go before any pod from the intolerable node.
      
      The new pods are not expected to be scheduled on the nodes which are not tolerated by the new damonset's template spec. The daemonset controller should just delete the old pods from the nodes which cannot be tolerated anymore. The old pods from the nodes which can still be tolerated should be recreated according to the rolling update parameters.
      

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      Always
      

      Steps to Reproduce:
      1. Create the daemonset which tolerates "node-role.kubernetes.io/master" taint and has the following rolling update parameters:

      $ oc -n openshift-dns get ds dns-default -o yaml | yq .spec.updateStrategy
      rollingUpdate:
        maxSurge: 10%
        maxUnavailable: 0
      type: RollingUpdate
      
      $ oc  -n openshift-dns get ds dns-default -o yaml | yq .spec.template.spec.tolerations
      - key: node-role.kubernetes.io/master
        operator: Exists
      

      2. Let the daemonset to be scheduled on all the target nodes (e.g. all masters and all workers)

      $ oc -n openshift-dns get pods  -o wide | grep dns-default
      dns-default-6bfmf     2/2     Running   0          119m    10.129.0.40   ci-ln-sb5ply2-72292-qlhc8-master-2         <none>           <none>
      dns-default-9cjdf     2/2     Running   0          2m35s   10.129.2.15   ci-ln-sb5ply2-72292-qlhc8-worker-c-m5wzq   <none>           <none>
      dns-default-c6j9x     2/2     Running   0          119m    10.128.0.13   ci-ln-sb5ply2-72292-qlhc8-master-0         <none>           <none>
      dns-default-fhqrs     2/2     Running   0          2m12s   10.131.0.29   ci-ln-sb5ply2-72292-qlhc8-worker-a-6q7hs   <none>           <none>
      dns-default-lx2nf     2/2     Running   0          119m    10.130.0.15   ci-ln-sb5ply2-72292-qlhc8-master-1         <none>           <none>
      dns-default-mmc78     2/2     Running   0          112m    10.128.2.7    ci-ln-sb5ply2-72292-qlhc8-worker-b-bpjdk   <none>           <none>
      

      3. Update the daemonset's tolerations by removing "node-role.kubernetes.io/master" and adding any other toleration (not existing works too):

      $ oc -n openshift-dns get ds dns-default -o yaml | yq .spec.template.spec.tolerations
      - key: test-taint
        operator: Exists
      

      Actual results:

      $ oc -n openshift-dns get pods  -o wide | grep dns-default
      dns-default-6bfmf     2/2     Running   0          124m    10.129.0.40   ci-ln-sb5ply2-72292-qlhc8-master-2         <none>           <none>
      dns-default-76vjz     0/2     Pending   0          3m2s    <none>        <none>                                     <none>           <none>
      dns-default-9cjdf     2/2     Running   0          7m24s   10.129.2.15   ci-ln-sb5ply2-72292-qlhc8-worker-c-m5wzq   <none>           <none>
      dns-default-c6j9x     2/2     Running   0          124m    10.128.0.13   ci-ln-sb5ply2-72292-qlhc8-master-0         <none>           <none>
      dns-default-fhqrs     2/2     Running   0          7m1s    10.131.0.29   ci-ln-sb5ply2-72292-qlhc8-worker-a-6q7hs   <none>           <none>
      dns-default-lx2nf     2/2     Running   0          124m    10.130.0.15   ci-ln-sb5ply2-72292-qlhc8-master-1         <none>           <none>
      dns-default-mmc78     2/2     Running   0          117m    10.128.2.7    ci-ln-sb5ply2-72292-qlhc8-worker-b-bpjdk   <none>           <none>
      

      Expected results:

      $ oc -n openshift-dns get pods  -o wide | grep dns-default
      dns-default-9cjdf     2/2     Running   0          7m24s   10.129.2.15   ci-ln-sb5ply2-72292-qlhc8-worker-c-m5wzq   <none>           <none>
      dns-default-fhqrs     2/2     Running   0          7m1s    10.131.0.29   ci-ln-sb5ply2-72292-qlhc8-worker-a-6q7hs   <none>           <none>
      dns-default-mmc78     2/2     Running   0          7m54s   10.128.2.7    ci-ln-sb5ply2-72292-qlhc8-worker-b-bpjdk   <none>           <none>
      

      Additional info:
      Upstream issue: https://github.com/kubernetes/kubernetes/issues/118823
      Slack discussion: https://redhat-internal.slack.com/archives/CKJR6200N/p1687455135950439

      Attachments

        Issue Links

          Activity

            People

              fkrepins@redhat.com Filip Krepinsky
              alebedev@redhat.com Andrey Lebedev
              ying zhou ying zhou
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: