-
Feature Request
-
Resolution: Done
-
Undefined
-
None
-
openshift-4.19
-
None
-
Product / Portfolio Work
-
None
-
False
-
-
None
-
None
-
None
-
-
None
-
None
-
None
-
None
-
None
1. Proposed title of this feature request
Implement support for coordinated draining of guest Kubernetes nodes when KubeVirt VMs are evicted or shutdown, in order to avoid abrupt workload disruption in HostedClusters using Hypershift.
2. What is the nature and description of the request?
Currently, in a Hypershift deployment with KubeVirt as the infrastructure provider, the lifecycle of the hosting VMs (guest nodes) is decoupled from the lifecycle of the guest Kubernetes node objects.
Specifically:
- When evictionStrategy is set to LiveMigrate, KubeVirt attempts to live migrate the VM.
- When evictionStrategy is set to None, KubeVirt shuts down the VM and restarts it elsewhere.
- In either case, there is no mechanism to cordon and drain the node in the guest cluster prior to the VM being stopped or moved.
This is particularly problematic when using CNIs that do not support live migration (e.g., Calico, Cilium), as VMs must be shut down to evacuate nodes. Because guest workloads are not drained, this leads to:
- Abrupt termination of pods running on the affected node.
- Potential data loss and application disruption.
- Unpredictable behavior of workloads sensitive to graceful termination.
Use Case / Impact:
- Clusters using alternative CNIs (Calico, Cilium) where live migration is unsupported.
- Scenarios where VMs need to be restarted for host maintenance, scaling, or updates.
- Operational workflows requiring predictable workload eviction behavior.
Without guest node draining support, administrators are forced to:
- Manually cordon/drain guest nodes prior to maintenance.
- Accept workload disruption.
This gap limits the operational reliability and adoption of KubeVirt as an infrastructure provider for Hypershift.
3. Why does the customer need this? (List the business requirements here)
Customer wants to use Calico/Cilium as CNI for HCP cluster. With this RFE it benefits
- Ensures graceful workload eviction during VM maintenance or failure recovery.
- Enables safe use of CNIs that do not support live migration.
- Improves operational consistency and predictability.
4. List any affected packages or components.
- Hypershift
- Virtualization