Uploaded image for project: 'OpenShift Request For Enhancement'
  1. OpenShift Request For Enhancement
  2. RFE-7883

Add a new worker latency profile LowEvictionLatency

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • openshift-4.19
    • Node
    • None
    • Product / Portfolio Work
    • None
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      1. Proposed title of this feature request

      Add a new worker latency profile LowEvictionLatency

      2. What is the nature and description of the request?

      Add a new latency profile, such as: LowEvictionLatency

      With the following tuning values:

      • default-unreachable-toleration-seconds: 40
      • All other parameters (including node-monitor-grace-period) remain consistent with the Default latency profile.

      3. Why does the customer need this? (List the business requirements here)

      StatefulSet workloads are particularly sensitive to delays in pod eviction when a node becomes unreachable. In OpenShift, the current default-unreachable-toleration-seconds value of 300 seconds (5 minutes) causes significant delays in failover for StatefulSet-based applications.

      For example, in high-availability configurations using ActiveMQ Broker (AMQ) with leader/follower roles, a sudden node failure results in the follower not assuming leadership until the leader pod is fully evicted—a process currently blocked by the long toleration period. This impacts message availability and system responsiveness.

      The root cause of the delay is tied to both Kubernetes scheduling logic and storage-level resource locks (e.g., CephFS file locks remaining held due to stale sessions). While storage configuration changes may mitigate the issue, they often involve trade-offs or limitations (e.g., abandoning ODF).

      Justification / Use Case:

      • Provides a tuned environment specifically for StatefulSet workloads requiring faster failover.
      • Reduces failover times from 5+ minutes to under 1 minute in case of sudden node failure.
      • Preserves existing tuning profiles (Default, Medium, High Latency) without impacting current users.
      • Avoids complex workarounds such as controller type changes or storage migration.

      4. List any affected packages or components.

      • OCP
      • node.config.openshift.io{}

      See: Worker latency profiles

              gausingh@redhat.com Gaurav Singh
              rhn-support-algonzal Alberto Gonzalez de Dios
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                None
                None