-
Feature Request
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
-
None
1. Problem Statement
Citrix Virtual Apps & Desktops (VDI) requires a consistent and predictable method for configuring GPU resources when creating VMs from a Master VM template.
Today, GPU configuration in OpenShift Virtualization is vendor-specific (NVIDIA vs. AMD vs. Intel), resource-specific (passthrough vs. mediated/vGPU), and expressed through multiple YAML fields, making it extremely challenging for Citrix to support all hardware types.
Citrix needs a single, vendor-neutral GPU abstraction, similar to Kubernetes StorageClass, so their product can provision GPUs without learning each vendor’s details.
2. Why This Is Needed
Citrix must support on-prem, cloud, multi-vendor, and multi-GPU-type deployments.
The current OpenShift Virtualization GPU configuration requires them to parse and override:
- spec.domain.devices.gpus
- pciHostDevices
- mediatedDevices
- resourceName (e.g., nvidia.com/mig-2g.10gb, amd.com/gpu)
- NodeSelectors (vendor-specific)
This is not maintainable across customer environments. Enterprises expect GPU provisioning to work like storage classes: simple and vendor-agnostic.
3. Proposed Solution (High-Level)
Introduce a GPUClass abstraction that allows users (or Citrix) to specify GPU requirements using a single simple field:
gpuClass: "gpu.vdi.medium
4. Functional Requirements
- A new resource type GPUClass, must allow defining GPU profiles decoupled from hardware/vendor specifics.
- OpenShift Virtualization must map GPUClass → actual hardware configuration (passthrough, vGPU/mdev, MIG slice, resourceName, selectors).
- Citrix Master VM GPU config must be preserved or overridden via Machine Profile using GPUClass.
- The GPUClass abstraction must work across NVIDIA/AMD/Intel
5. Acceptance Criteria
- AC1: A user can define one or more GPUClass resources.
- AC2: A VM using gpuClass: X is scheduled and configured correctly across hardware vendors.
- AC3: Citrix Machine Profile can override master VM GPUClass cleanly.
- AC4: System correctly handles fallback when hardware does not match GPUClass.
- AC5: Documentation includes examples for NVIDIA, AMD, Intel, and cloud GPUs.{}
6. Non- goals
- This RFE does NOT require redesigning device plugins.
- This RFE does NOT replace vendor drivers or MIG tooling.
- This RFE does NOT define performance guarantees.
- This RFE does NOT introduce a new scheduler, only an abstraction layer.