Uploaded image for project: 'OpenStack Strategy'
  1. OpenStack Strategy
  2. RHOSSTRAT-1202

Nvidia Time sliced GPU support via cyborg.

XMLWordPrintable

    • Icon: Feature Feature
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • Compute, openstack-cyborg
    • None
    • Not Selected
    • False
    • False
    • Hide

      None

      Show
      None
    • 0
    • 0
    • rhos-workloads-evolution

      Feature Overview
      This feature enables support for NVIDIA time-sliced GPUs (vGPUs) by integrating
      Cyborg with Nova’s mediated device (mdev) framework. The primary goal is to
      transition vGPU management from Nova-native logic to a Cyborg-managed model.
      This provides a consistent lifecycle for hardware accelerators and allows for
      more granular resource scheduling via Placement. By supporting mdevs in Cyborg,
      operators can leverage NVIDIA’s time-slicing capabilities for multi-tenant
      workloads while maintaining a unified API for all accelerator types.

      Goals

      • Finalize Nova Generic MDEV Testing: Complete the validation of the generic
        mdev implementation within Nova to ensure a stable foundation for
        cross-component handoffs.
      • Replicate MDEV Management in Cyborg: Port and adapt the mdev discovery,
        creation, and inventory management logic from Nova into Cyborg’s
        driver framework.
      • Cross-Component Integration: Enable Nova to request and attach vGPUs
        specifically managed by Cyborg, ensuring proper handling of mdev
        types and UUIDs.
      • Parity and Extension: Achieve functional parity with Nova's legacy vGPU
        support while adding Cyborg's advanced device profiling and
        metadata capabilities.

      Requirements

      Requirement Notes isMVP?
      Nova mdev Test Completion Finish and merge pending generic mdev tests
      in Nova.
      Yes
      Cyborg mdev Driver Support Implement mdev creation/deletion in Cyborg
      NVIDIA drivers.
      Yes
      Resource Provider Mapping Report vGPU mdev types as traits/resources to
      Placement via Cyborg.
      Yes
      Nova-Cyborg Attachment Update Nova's virt drivers to attach mdevs using
      Cyborg handles.
      Yes
      MDEV Persistence Ensure mdevs are recreated or managed correctly across
      host reboots.
      No

      Done - Acceptance Criteria

      • A Virtual Machine successfully boots in Nova using a vGPU resource
        allocated and managed by Cyborg.
      • The "NVIDIA time-sliced" mdev types are correctly discovered by Cyborg
        and visible in the Cyborg API.
      • Nova’s libvirt driver correctly generates the domain XML for Cyborg-managed
        mdev devices, including the correct UUID and parent address.
      • The integrated testing suite confirms that mdev resources are correctly
        cleaned up in both Cyborg and Nova upon instance termination.
      • Successful replication of the Nova generic mdev test suite results
        within the Cyborg CI environment.

      Use Cases - i.e. User Experience & Workflow:

      • Time-Sliced GPU Allocation: A user requests a "small" GPU profile via a
        Cyborg device profile; the system selects a physical NVIDIA GPU and
        creates a specific time-sliced vGPU (mdev) for that instance.
      • Shared Hardware Management: An administrator manages a pool of NVIDIA
        GPUs via Cyborg, adjusting the supported mdev types without
        reconfiguring Nova.

      Out of Scope

      • Support for non-mdev based GPU virtualization (e.g., direct PCI
        passthrough is handled by existing paths).
      • Multi-Instance GPU (MIG) dynamic partitioning (this focus is strictly
        on time-slicing/mdev).

      Documentation Considerations

      Questions to Answer

      • How will Cyborg handle the naming conflicts if both Nova and Cyborg
        attempt to manage mdevs on the same host during a transition period?
      • Does the replication of Nova's mdev logic require new dependencies
        in Cyborg's requirements.txt?

      Background and Strategic Fit
      This feature aligns with the 2026.1 "cyborg-vgpu-support" approved spec.
      It is a critical step in de-coupling hardware-specific logic from Nova,
      moving toward the "Cyborg-as-the-source-of-truth" model for all
      pluggable accelerators.

      Customer Considerations
      Customers utilizing NVIDIA vGPU software will need to ensure their host
      drivers are compatible with the mdev bus before switching management to
      Cyborg.

      Team Sign Off

      Reviewed By Team Name Accepted Notes
             
             

              mmagr@redhat.com Martin Magr
              smooney@redhat.com Sean Mooney
              Sudhakar Molli Sudhakar Molli
              Edu Alcaniz Edu Alcaniz
              rhos-workloads-evolution
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: