-
Epic
-
Resolution: Unresolved
-
Critical
-
None
-
None
-
None
-
None
-
TP Partitionable Devices Downstream
-
To Do
-
Product / Portfolio Work
-
-
63% To Do, 38% In Progress, 0% Done
-
False
-
-
False
-
Not Selected
-
None
-
None
-
None
Epic Goal
- Enable the Partitionable Devices feature (KEP-4815) when cluster is in TechPreviewNoUpgrade (TPNU)
Why is this important?
- Partitionable Devices allows device drivers to advertise multiple overlapping logical devices ("partitions") of a single physical device
- Enables dynamic partitioning of GPUs (e.g., NVIDIA MIG) based on workload requirements
- Improves device utilization by allowing flexible allocation across multiple workloads
- Required for advanced GPU sharing scenarios in AI/ML workloads
Scenarios
- User enables TechPreviewNoUpgrade feature set on cluster
- DRAPartitionableDevices feature gate becomes active
- Device drivers (e.g., NVIDIA DRA driver) can advertise partitionable devices
- Scheduler allocates device partitions dynamically based on pod requests
Acceptance Criteria
- CI - MUST be running successfully with tests automated
- Release Technical Enablement - Provide necessary release enablement details and documents
- Feature gate DRAPartitionableDevices added to openshift/api and enabled in TPNU
- NVIDIA DRA driver validated with partitionable devices functionality
Dependencies (internal and external)
- Kubernetes DRAPartitionableDevices feature gate (KEP-4815) - Alpha in k8s 1.33
- openshift/api PR: https://github.com/openshift/api/pull/2694
- NVIDIA DRA driver with partitionable devices support
Previous Work (Optional):
- OCPNODE-3989: Partitionable Devices KEP to beta (upstream tracking)
- OCPNODE-3676: KEP 4815: Partitionable Devices (initial investigation)
- Kubernetes Enhancement: https://github.com/kubernetes/enhancements/issues/4815
Open questions:
- Upgrade path for nvidia-dra-driver via helm (OCPNODE-4048)
Done Checklist
- CI - CI is running, tests are automated and merged.
- Release Enablement - <link to Feature Enablement Presentation>
- DEV - Feature gate added to openshift/api: https://github.com/openshift/api/pull/2694
- DEV - Upstream code and tests merged: https://github.com/kubernetes/enhancements/issues/4815
- DEV - Downstream build attached to advisory: <link to errata>
- QE - Test plans in Polarion: <link or reference to Polarion>
- QE - Automated tests merged: <link or reference to automated tests>
- DOC - Downstream documentation merged: <link to meaningful PR>