-
Feature
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
Product / Portfolio Work
-
-
False
-
-
False
-
None
-
None
-
None
-
None
-
None
-
-
None
-
None
-
None
-
None
Background:
In OpenShift, the Dynamic Resource Allocation (DRA) framework manages the allocation of specialized hardware resources like GPUs and NICs. Traditionally, DRA assumes that devices are locally attached and immediately available on the node where a pod is scheduled. However, certain devices, such as network-attached or fabric-attached resources, require pre-attachment to a node before a pod can utilize them.GitHub
Enhancement Summary:
KEP-5007 introduces the concept of device binding conditions within the DRA framework. This enhancement allows the scheduler to recognize and handle devices that need to be attached to a node prior to pod scheduling. By incorporating binding conditions, the scheduler can ensure that pods are only scheduled to nodes where the required devices are already attached, preventing scheduling failures and improving resource utilization.GitHub
Use Cases in OpenShift
Use Case | Description |
---|---|
High-Performance Computing (HPC) | Ensure that HPC workloads requiring specialized network-attached devices are scheduled only on nodes where these devices are pre-attached. |
AI/ML Workloads | Facilitate the scheduling of AI/ML workloads that depend on remote accelerators by ensuring device availability prior to pod scheduling. |
Storage Solutions | Support storage-intensive applications by pre-attaching necessary storage devices to nodes before scheduling pods that depend on them. |
Network Function Virtualization (NFV) | Enable NFV applications to utilize specific network interfaces by ensuring these interfaces are bound to nodes ahead of pod deployment. |