1. Proposed title of this feature request
Add a new worker latency profile LowEvictionLatency
2. What is the nature and description of the request?
Add a new latency profile, such as: LowEvictionLatency
With the following tuning values:
- default-unreachable-toleration-seconds: 40
- All other parameters (including node-monitor-grace-period) remain consistent with the Default latency profile.
3. Why does the customer need this? (List the business requirements here)
StatefulSet workloads are particularly sensitive to delays in pod eviction when a node becomes unreachable. In OpenShift, the current default-unreachable-toleration-seconds value of 300 seconds (5 minutes) causes significant delays in failover for StatefulSet-based applications.
For example, in high-availability configurations using ActiveMQ Broker (AMQ) with leader/follower roles, a sudden node failure results in the follower not assuming leadership until the leader pod is fully evicted—a process currently blocked by the long toleration period. This impacts message availability and system responsiveness.
The root cause of the delay is tied to both Kubernetes scheduling logic and storage-level resource locks (e.g., CephFS file locks remaining held due to stale sessions). While storage configuration changes may mitigate the issue, they often involve trade-offs or limitations (e.g., abandoning ODF).
Justification / Use Case:
- Provides a tuned environment specifically for StatefulSet workloads requiring faster failover.
- Reduces failover times from 5+ minutes to under 1 minute in case of sudden node failure.
- Preserves existing tuning profiles (Default, Medium, High Latency) without impacting current users.
- Avoids complex workarounds such as controller type changes or storage migration.
4. List any affected packages or components.
- OCP
- node.config.openshift.io{}