Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Undefined
Fix Version/s: 4.18.0
Affects Version/s: 4.15, 4.16, 4.17, 4.18
Component/s: HyperShift
Labels:
- triaged

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
None
Severity:
None
Regression:
None

Target Backport Versions:

4.18
Target Version:

4.18.z
Release Blocker:
None
Sprint:
None

Blocked by Bugzilla Bug:
https://issues.redhat.com//browse/OCPBUGS-50518

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Release Note Status:
Done
Release Note Type:
Release Note Not Required
Release Note Text:
N/A

Escape Reason:
None
Escape Impact:
None
Corrective Measures:
None
SDLC stage when should've been found:
None

This is a clone of issue ~~OCPBUGS-52661~~. The following is the description of the original issue:
—
Description of problem:

    Whenever the audit webhook is enabled, we should increase the time it takes to terminate the kubernetes apiserver and modify the audit-webhook-initial-backoff (default 10 seconds) to 5 seconds. 

When the audit webhook is enabled and kube-apiserver is given a SIG Term signal, it will hold 70 seconds before terminating (determined by shutdown-delay-duration), then the apiserver will start shutting down. The audit webhook will attempt 10 times and retry after 10 seconds. So I'm suggestion we modify audit-webhook-initial-backoff to 5 second so it only needs an extra 50 seconds and increase the termination grace period to 130 seconds. This will give us a 5-10 second buffer for the audit webhook to terminate gracefully.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Create a hypershift cluster with https://github.com/openshift/hypershift/pull/5475 and https://github.com/openshift/hypershift/pull/5491     2. enable auditting and a webhook on the cluster (you'd have to figure out this part yourself)
    3. Time the deletion of a kube-apiserver pod

Actual results:

Expected results:

Additional info:

I tested this myself and I appear to be getting between 120-126 seconds. Justifying why I think 130 would be a good number for termination grace period.

# joseph.goergen@stgiks-dal10-carrier0-worker-1002:~$ time kubectl delete pod -n master-cv5k78u20ksllb9rsk1g kube-apiserver-5d7fcf5b9f-52cm8
pod "kube-apiserver-5d7fcf5b9f-52cm8" deletedreal    2m2.267s
user    0m0.361s
sys    0m0.092s
# joseph.goergen@stgiks-dal10-carrier0-worker-1002:~$ time kubectl delete pod -n master-cv5k78u20ksllb9rsk1g kube-apiserver-5d7fcf5b9f-rtk5x
pod "kube-apiserver-5d7fcf5b9f-rtk5x" deletedreal    2m6.099s
user    0m0.327s
sys    0m0.082s
# joseph.goergen@stgiks-dal10-carrier0-worker-1002:~$ time kubectl delete pod -n master-cv5k78u20ksllb9rsk1g kube-apiserver-5d7fcf5b9f-h4smt
pod "kube-apiserver-5d7fcf5b9f-h4smt" deletedreal    1m59.644s
user    0m0.244s
sys    0m0.137s

clones

OCPBUGS-52661 KAS: Adjust termination grace period when the audit webhook is enabled

Closed

is blocked by

OCPBUGS-52661 KAS: Adjust termination grace period when the audit webhook is enabled

Closed

links to

openshift/hypershift#6020: [release-4.18] OCPBUGS-54865: KAS-Bump audit-webhook-initial-backoff and TerminationGracePeriodSeconds when audit webhook is enabled

RHSA-2025:4712 OpenShift Container Platform 4.18.z security update

Assignee:: Unassigned

Reporter:: OpenShift Prow Bot

QA Contact:: Wen Wang

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 6 Start watching this issue

Created:: 2025/04/10 3:26 PM

Updated:: 2025/07/14 1:16 PM

Resolved:: 2025/05/14 2:10 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates