Type: Feature
Resolution: Unresolved
Priority: Normal
Fix Version/s: None
Affects Version/s: None
Component/s: CNV Virt-Node
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Color Status:
Not Selected
Parent Link:
VIRTSTRAT-30Improved VM workload density

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Impact Score:

Intelligence Requested:
Market:

Feature Overview

The feature will allow administrators to define priority levels for virtual machine (VM) workloads, specifically concerning the allocation and reclamation of CPU and memory resources when system contention occurs. This ensures that mission-critical or high-priority VMs receive preferential access to resources over lower-priority workloads during periods of resource scarcity on the OpenShift node.

Goals

Enable administrators to assign different priority tiers (e.g., High, Medium, Low) to virtual machines.
Ensure that higher-priority VMs are protected from resource contention caused by lower-priority VMs.
Provide mechanisms for higher-priority VMs to preferentially access available CPU cycles and memory over lower-priority VMs during contention.
Allow for flexible configuration of resource reservation and limits based on priority level.
Maintain cluster stability and performance for critical infrastructure components.

Requirements

Requirement	Notes	isMvp?
API Extension for Priority Definition	A method to define and assign a priority level (e.g., an integer value or named tier) to a VirtualMachine object.	Yes
CPU Scheduling Integration	Ensure higher-priority VMs receive preferential scheduling time/shares from the underlying CPU scheduler (e.g., via Kubernetes Quality of Service/Guaranteed classes or cgroups).	Yes
Memory Reclamation Policy	Implement logic so that when memory contention occurs, lower-priority VM memory is reclaimed/throttled before higher-priority VM memory.	Yes
Priority Validation	Implement admission control to validate that priority settings are within defined ranges or known tiers.	Yes
Monitoring and Reporting	Provide metrics and events indicating when resource prioritization logic is actively impacting VM resource allocation or throttling.	Yes
Documentation	Comprehensive documentation on how to configure and observe resource prioritization.	Yes
Dynamic Priority Changes	Ability to modify a VM's priority level without requiring a VM restart (if technically feasible).	No
Node-Level Policy Override	Allow cluster administrators to set node-specific resource prioritization policies that might override cluster defaults.	No

Use Cases

Mission-Critical Workloads: A user runs a critical database VM and several development/test VMs on the same cluster. The database VM must maintain performance even under load. By setting the database VM to "High" priority and the development VMs to "Low" priority, the database is protected from resource contention.
Tenant Separation in Multi-Tenant Clusters: In a cluster shared by multiple organizational tenants, different SLAs mandate different levels of resource guarantee. High-paying or mission-aligned tenants receive "High" priority, ensuring their applications remain performant regardless of noisy neighbors from lower-tier tenants.
Graceful Degradation: During unexpected system overload (e.g., a host failure causing VMs to migrate to a less-resourced node), the resource prioritization system ensures that the most important VMs are the last to experience significant performance degradation.

Out of Scope

Storage I/O prioritization (focus is solely on CPU and Memory).
Network bandwidth prioritization.
Advanced admission control based on aggregate resource usage per priority level across the cluster (focus is per-node enforcement).
Automatic priority adjustment based on workload behavior or cluster conditions (dynamic optimization).

Background, and strategic fit

Resource contention management is a fundamental requirement for any enterprise virtualization platform, ensuring predictable performance for critical applications. OpenShift Virtualization, leveraging Kubernetes, currently uses standard Kubernetes QoS (Quality of Service) for resource guarantees (Guaranteed, Burstable, BestEffort). However, a mechanism to differentiate between VMs within the same or similar QoS class, or a simpler abstracted mechanism for prioritization, is needed to meet enterprise expectations established by platforms like VMWare. This feature is strategically important to drive adoption in environments hosting mixed workload tiers and to position OpenShift Virtualization as a viable alternative for consolidated virtualization workloads.

Assumptions

The underlying operating system and Kubernetes resource management (cgroups, CPU scheduling) provide the necessary hooks to implement priority-based resource management effectively.
The Kubernetes scheduler and KubeVirt components can be extended to integrate and enforce the defined VM priorities.
Users will accurately assess and assign appropriate priority levels to their VMs.

Customer Considerations

Customers managing consolidated environments expect performance guarantees. This feature directly addresses the "noisy neighbor" problem and allows customers to enforce internal Service Level Agreements (SLAs) more effectively. It simplifies the process of ensuring that Tier 0/1 applications are always served before Tier 2/3 applications, offering a clear path to production deployment for mission-critical workloads on OpenShift Virtualization.

User Experience Considerations

Simplicity: The priority definition mechanism should be easy to understand (e.g., using named tiers or a simple 1-10 scale) rather than requiring deep knowledge of cgroup settings.
Visibility: Clear indicators (e.g., in virtctl output or OpenShift Console) should show a VM's assigned priority and whether resource throttling is currently active due to prioritization rules.
Default Behavior: Sensible default priority settings should be applied if none are explicitly specified, ensuring backward compatibility and a non-disruptive experience for existing workloads.

Details

Description

Feature Overview

Goals

Requirements

Use Cases

Out of Scope

Background, and strategic fit

Assumptions

Customer Considerations

User Experience Considerations

Attachments

Easy Agile Planning Poker

Activity

People

Dates