Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-54865

KAS: Adjust termination grace period when the audit webhook is enabled

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Undefined Undefined
    • 4.18.0
    • 4.15, 4.16, 4.17, 4.18
    • HyperShift
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • Done
    • Release Note Not Required
    • N/A
    • None
    • None
    • None
    • None

      This is a clone of issue OCPBUGS-52661. The following is the description of the original issue:

      Description of problem:

          Whenever the audit webhook is enabled, we should increase the time it takes to terminate the kubernetes apiserver and modify the audit-webhook-initial-backoff (default 10 seconds) to 5 seconds. 
      
      When the audit webhook is enabled and kube-apiserver is given a SIG Term signal, it will hold 70 seconds before terminating (determined by shutdown-delay-duration), then the apiserver will start shutting down. The audit webhook will attempt 10 times and retry after 10 seconds. So I'm suggestion we modify audit-webhook-initial-backoff to 5 second so it only needs an extra 50 seconds and increase the termination grace period to 130 seconds. This will give us a 5-10 second buffer for the audit webhook to terminate gracefully.

      Version-Release number of selected component (if applicable):

          

      How reproducible:

          

      Steps to Reproduce:

          1. Create a hypershift cluster with https://github.com/openshift/hypershift/pull/5475 and https://github.com/openshift/hypershift/pull/5491     2. enable auditting and a webhook on the cluster (you'd have to figure out this part yourself)
          3. Time the deletion of a kube-apiserver pod
          

      Actual results:

          

      Expected results:

          

      Additional info:

      I tested this myself and I appear to be getting between 120-126 seconds. Justifying why I think 130 would be a good number for termination grace period.
      # joseph.goergen@stgiks-dal10-carrier0-worker-1002:~$ time kubectl delete pod -n master-cv5k78u20ksllb9rsk1g kube-apiserver-5d7fcf5b9f-52cm8
      pod "kube-apiserver-5d7fcf5b9f-52cm8" deletedreal    2m2.267s
      user    0m0.361s
      sys    0m0.092s
      # joseph.goergen@stgiks-dal10-carrier0-worker-1002:~$ time kubectl delete pod -n master-cv5k78u20ksllb9rsk1g kube-apiserver-5d7fcf5b9f-rtk5x
      pod "kube-apiserver-5d7fcf5b9f-rtk5x" deletedreal    2m6.099s
      user    0m0.327s
      sys    0m0.082s
      # joseph.goergen@stgiks-dal10-carrier0-worker-1002:~$ time kubectl delete pod -n master-cv5k78u20ksllb9rsk1g kube-apiserver-5d7fcf5b9f-h4smt
      pod "kube-apiserver-5d7fcf5b9f-h4smt" deletedreal    1m59.644s
      user    0m0.244s
      sys    0m0.137s

              Unassigned Unassigned
              openshift-crt-jira-prow OpenShift Prow Bot
              None
              None
              Wen Wang Wen Wang
              None
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: