-
Feature
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
Product / Portfolio Work
-
-
False
-
-
False
-
None
-
None
-
None
-
None
-
None
-
-
None
-
None
-
None
-
None
Background:
In OpenShift, the Dynamic Resource Allocation (DRA) framework manages the allocation of specialized hardware resources like GPUs and NICs. Traditionally, DRA lacks a mechanism to prevent the scheduling of pods onto devices that are in an undesirable state (e.g., degraded, under maintenance). This limitation can lead to suboptimal performance or failures in workloads that depend on these devices.
Enhancement Summary:
KEP-5055 introduces the concept of device taints and tolerations within the DRA framework. This enhancement allows administrators to mark specific devices with taints, indicating that they should not be used for new pods unless those pods explicitly tolerate the taints. This mechanism provides finer-grained control over device scheduling, ensuring that workloads are only assigned to appropriate devices.
Use Cases in OpenShift
Use Case | Description |
---|---|
Device Maintenance | Prevent new pods from being scheduled on devices undergoing maintenance by tainting them, ensuring stability and avoiding disruptions. |
Degraded Hardware | Mark devices that are experiencing issues as tainted to prevent their use until they are verified to be healthy. |
Workload Isolation | Use taints to reserve specific devices for particular workloads or tenants, enforcing isolation and compliance requirements. |
Testing and Validation | Taint devices designated for testing to ensure that only test workloads are scheduled on them, avoiding interference with production workloads. |