-
Bug
-
Resolution: Done
-
Undefined
-
None
-
4.18
-
None
-
None
-
False
-
Description of problem:
This PR introduces graceful shutdown functionality to the Multus daemon by adding a /readyz endpoint alongside the existing /healthz. The /readyz endpoint starts returning 500 once a SIGTERM is received, indicating the daemon is in shutdown mode. During this time, CNI requests can still be processed for a short window. The daemonset configs have been updated to increase terminationGracePeriodSeconds from 10 to 30 seconds, ensuring we have a bit more time for these clean shutdowns.This addresses a race condition during pod transitions where the readiness check might return true, but a subsequent CNI request could fail if the daemon shuts down too quickly. By introducing the /readyz endpoint and delaying the shutdown, we can handle ongoing CNI requests more gracefully, reducing the risk of disruptions during critical transitions.
Version-Release number of selected component (if applicable):
How reproducible:
Difficult to reproduce, might require CI signal
- is cloned by
-
OCPBUGS-42238 Multus daemonset requires graceful termination [cno integration]
- Closed
- links to
-
RHEA-2024:6122 OpenShift Container Platform 4.18.z bug fix update