Uploaded image for project: 'OpenShift Node'
  1. OpenShift Node
  2. OCPNODE-4069

TP Partitionable Devices Downstream

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • None
    • None
    • None
    • TP Partitionable Devices Downstream
    • To Do
    • Product / Portfolio Work
    • OCPSTRAT-2114DRA: Add support for partitionable devices : Tech P 4.22
    • 63% To Do, 38% In Progress, 0% Done
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • None
    • None
    • None

      Epic Goal

      • Enable the Partitionable Devices feature (KEP-4815) when cluster is in TechPreviewNoUpgrade (TPNU)

      Why is this important?

      • Partitionable Devices allows device drivers to advertise multiple overlapping logical devices ("partitions") of a single physical device
      • Enables dynamic partitioning of GPUs (e.g., NVIDIA MIG) based on workload requirements
      • Improves device utilization by allowing flexible allocation across multiple workloads
      • Required for advanced GPU sharing scenarios in AI/ML workloads

      Scenarios

      1. User enables TechPreviewNoUpgrade feature set on cluster
      2. DRAPartitionableDevices feature gate becomes active
      3. Device drivers (e.g., NVIDIA DRA driver) can advertise partitionable devices
      4. Scheduler allocates device partitions dynamically based on pod requests

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated
      • Release Technical Enablement - Provide necessary release enablement details and documents
      • Feature gate DRAPartitionableDevices added to openshift/api and enabled in TPNU
      • NVIDIA DRA driver validated with partitionable devices functionality

      Dependencies (internal and external)

      1. Kubernetes DRAPartitionableDevices feature gate (KEP-4815) - Alpha in k8s 1.33
      2. openshift/api PR: https://github.com/openshift/api/pull/2694
      3. NVIDIA DRA driver with partitionable devices support

      Previous Work (Optional):

      1. OCPNODE-3989: Partitionable Devices KEP to beta (upstream tracking)
      2. OCPNODE-3676: KEP 4815: Partitionable Devices (initial investigation)
      3. Kubernetes Enhancement: https://github.com/kubernetes/enhancements/issues/4815

      Open questions:

      1. Upgrade path for nvidia-dra-driver via helm (OCPNODE-4048)

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement - <link to Feature Enablement Presentation>
      • DEV - Feature gate added to openshift/api: https://github.com/openshift/api/pull/2694
      • DEV - Upstream code and tests merged: https://github.com/kubernetes/enhancements/issues/4815
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>

              harpatil@redhat.com Harshal Patil
              pehunt@redhat.com Peter Hunt
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: