Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Done
Priority: Undefined
Fix Version/s: None
Affects Version/s: openshift-4.19
Component/s: Hosted Control Planes
Labels:
- CNI
- cee.next_proposed
- hcp
- non-live-migration
- virt

Target Version:
None
Activity Type:
Product / Portfolio Work
Status Summary:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Products:
None
Hierarchy Progress Bar:
None

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Review Complete:
None
PX Impact Score:
PX Impact Range:
None
PX Priority Data:
None
PX Technical Impact:
None
PX Technical Impact Notes:
None
PX Scheduling Request:
None

1. Proposed title of this feature request

Implement support for coordinated draining of guest Kubernetes nodes when KubeVirt VMs are evicted or shutdown, in order to avoid abrupt workload disruption in HostedClusters using Hypershift.

2. What is the nature and description of the request?

Currently, in a Hypershift deployment with KubeVirt as the infrastructure provider, the lifecycle of the hosting VMs (guest nodes) is decoupled from the lifecycle of the guest Kubernetes node objects.

Specifically:

When evictionStrategy is set to LiveMigrate, KubeVirt attempts to live migrate the VM.

When evictionStrategy is set to None, KubeVirt shuts down the VM and restarts it elsewhere.

In either case, there is no mechanism to cordon and drain the node in the guest cluster prior to the VM being stopped or moved.

This is particularly problematic when using CNIs that do not support live migration (e.g., Calico, Cilium), as VMs must be shut down to evacuate nodes. Because guest workloads are not drained, this leads to:

Abrupt termination of pods running on the affected node.

Potential data loss and application disruption.

Unpredictable behavior of workloads sensitive to graceful termination.

Use Case / Impact:

Clusters using alternative CNIs (Calico, Cilium) where live migration is unsupported.

Scenarios where VMs need to be restarted for host maintenance, scaling, or updates.

Operational workflows requiring predictable workload eviction behavior.

Without guest node draining support, administrators are forced to:

Manually cordon/drain guest nodes prior to maintenance.

Accept workload disruption.

This gap limits the operational reliability and adoption of KubeVirt as an infrastructure provider for Hypershift.

3. Why does the customer need this? (List the business requirements here)

Customer wants to use Calico/Cilium as CNI for HCP cluster. With this RFE it benefits

Ensures graceful workload eviction during VM maintenance or failure recovery.

Enables safe use of CNIs that do not support live migration.

Improves operational consistency and predictability.

4. List any affected packages or components.

Hypershift

Virtualization

Assignee:: Ramon Acedo

Reporter:: Chinmay Deshpande

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/07/03 9:04 AM

Updated:: 2025/09/13 7:36 PM

Resolved:: 2025/07/04 2:31 PM

Target start:: None

Target end:: None

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates