Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-73340

[RFE]Vendor-Agnostic GPU Configuration for OpenShift Virt (Citrix Integration)

XMLWordPrintable

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • None

      1. Problem Statement

      Citrix Virtual Apps & Desktops (VDI) requires a consistent and predictable method for configuring GPU resources when creating VMs from a Master VM template.
      Today, GPU configuration in OpenShift Virtualization is vendor-specific (NVIDIA vs. AMD vs. Intel), resource-specific (passthrough vs. mediated/vGPU), and expressed through multiple YAML fields, making it extremely challenging for Citrix to support all hardware types.

      Citrix needs a single, vendor-neutral GPU abstraction, similar to Kubernetes StorageClass, so their product can provision GPUs without learning each vendor’s details.

      2. Why This Is Needed

      Citrix must support on-prem, cloud, multi-vendor, and multi-GPU-type deployments.

      The current OpenShift Virtualization GPU configuration requires them to parse and override:

      • spec.domain.devices.gpus
      • pciHostDevices
      • mediatedDevices
      • resourceName (e.g., nvidia.com/mig-2g.10gb, amd.com/gpu)
      • NodeSelectors (vendor-specific)

      This is not maintainable across customer environments. Enterprises expect GPU provisioning to work like storage classes: simple and vendor-agnostic.

      3. Proposed Solution (High-Level)

      Introduce a GPUClass abstraction that allows users (or Citrix) to specify GPU requirements using a single simple field:

      gpuClass: "gpu.vdi.medium

      4. Functional Requirements

      • A new resource type GPUClass, must allow defining GPU profiles decoupled from hardware/vendor specifics.
      • OpenShift Virtualization must map GPUClass → actual hardware configuration (passthrough, vGPU/mdev, MIG slice, resourceName, selectors).
      • Citrix Master VM GPU config must be preserved or overridden via Machine Profile using GPUClass.
      • The GPUClass abstraction must work across NVIDIA/AMD/Intel

      5. Acceptance Criteria

      • AC1: A user can define one or more GPUClass resources.
      • AC2: A VM using gpuClass: X is scheduled and configured correctly across hardware vendors.
      • AC3: Citrix Machine Profile can override master VM GPUClass cleanly.
      • AC4: System correctly handles fallback when hardware does not match GPUClass.
      • AC5: Documentation includes examples for NVIDIA, AMD, Intel, and cloud GPUs.{}

      6. Non- goals

      • This RFE does NOT require redesigning device plugins.
      • This RFE does NOT replace vendor drivers or MIG tooling.
      • This RFE does NOT define performance guarantees.
      • This RFE does NOT introduce a new scheduler, only an abstraction layer.

              kbidarka@redhat.com Kedar Bidarkar
              rh-ee-smolli Sudhakar Molli
              Kedar Bidarkar Kedar Bidarkar
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: