-
Feature
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
Feature Overview
The feature will allow administrators to define priority levels for virtual machine (VM) workloads, specifically concerning the allocation and reclamation of CPU and memory resources when system contention occurs. This ensures that mission-critical or high-priority VMs receive preferential access to resources over lower-priority workloads during periods of resource scarcity on the OpenShift node.
Goals
- Enable administrators to assign different priority tiers (e.g., High, Medium, Low) to virtual machines.
- Ensure that higher-priority VMs are protected from resource contention caused by lower-priority VMs.
- Provide mechanisms for higher-priority VMs to preferentially access available CPU cycles and memory over lower-priority VMs during contention.
- Allow for flexible configuration of resource reservation and limits based on priority level.
- Maintain cluster stability and performance for critical infrastructure components.
Requirements
| Requirement | Notes | isMvp? |
|---|---|---|
| API Extension for Priority Definition | A method to define and assign a priority level (e.g., an integer value or named tier) to a VirtualMachine object. | Yes |
| CPU Scheduling Integration | Ensure higher-priority VMs receive preferential scheduling time/shares from the underlying CPU scheduler (e.g., via Kubernetes Quality of Service/Guaranteed classes or cgroups). | Yes |
| Memory Reclamation Policy | Implement logic so that when memory contention occurs, lower-priority VM memory is reclaimed/throttled before higher-priority VM memory. | Yes |
| Priority Validation | Implement admission control to validate that priority settings are within defined ranges or known tiers. | Yes |
| Monitoring and Reporting | Provide metrics and events indicating when resource prioritization logic is actively impacting VM resource allocation or throttling. | Yes |
| Documentation | Comprehensive documentation on how to configure and observe resource prioritization. | Yes |
| Dynamic Priority Changes | Ability to modify a VM's priority level without requiring a VM restart (if technically feasible). | No |
| Node-Level Policy Override | Allow cluster administrators to set node-specific resource prioritization policies that might override cluster defaults. | No |
Use Cases
- Mission-Critical Workloads: A user runs a critical database VM and several development/test VMs on the same cluster. The database VM must maintain performance even under load. By setting the database VM to "High" priority and the development VMs to "Low" priority, the database is protected from resource contention.
- Tenant Separation in Multi-Tenant Clusters: In a cluster shared by multiple organizational tenants, different SLAs mandate different levels of resource guarantee. High-paying or mission-aligned tenants receive "High" priority, ensuring their applications remain performant regardless of noisy neighbors from lower-tier tenants.
- Graceful Degradation: During unexpected system overload (e.g., a host failure causing VMs to migrate to a less-resourced node), the resource prioritization system ensures that the most important VMs are the last to experience significant performance degradation.
Out of Scope
- Storage I/O prioritization (focus is solely on CPU and Memory).
- Network bandwidth prioritization.
- Advanced admission control based on aggregate resource usage per priority level across the cluster (focus is per-node enforcement).
- Automatic priority adjustment based on workload behavior or cluster conditions (dynamic optimization).
Background, and strategic fit
Resource contention management is a fundamental requirement for any enterprise virtualization platform, ensuring predictable performance for critical applications. OpenShift Virtualization, leveraging Kubernetes, currently uses standard Kubernetes QoS (Quality of Service) for resource guarantees (Guaranteed, Burstable, BestEffort). However, a mechanism to differentiate between VMs within the same or similar QoS class, or a simpler abstracted mechanism for prioritization, is needed to meet enterprise expectations established by platforms like VMWare. This feature is strategically important to drive adoption in environments hosting mixed workload tiers and to position OpenShift Virtualization as a viable alternative for consolidated virtualization workloads.
Assumptions
- The underlying operating system and Kubernetes resource management (cgroups, CPU scheduling) provide the necessary hooks to implement priority-based resource management effectively.
- The Kubernetes scheduler and KubeVirt components can be extended to integrate and enforce the defined VM priorities.
- Users will accurately assess and assign appropriate priority levels to their VMs.
Customer Considerations
Customers managing consolidated environments expect performance guarantees. This feature directly addresses the "noisy neighbor" problem and allows customers to enforce internal Service Level Agreements (SLAs) more effectively. It simplifies the process of ensuring that Tier 0/1 applications are always served before Tier 2/3 applications, offering a clear path to production deployment for mission-critical workloads on OpenShift Virtualization.
User Experience Considerations
- Simplicity: The priority definition mechanism should be easy to understand (e.g., using named tiers or a simple 1-10 scale) rather than requiring deep knowledge of cgroup settings.
- Visibility: Clear indicators (e.g., in virtctl output or OpenShift Console) should show a VM's assigned priority and whether resource throttling is currently active due to prioritization rules.
- Default Behavior: Sensible default priority settings should be applied if none are explicitly specified, ensuring backward compatibility and a non-disruptive experience for existing workloads.