[CNV-40762] CPU Hotplug can't be canceled when migration fails repeatedly in a loop. - Red Hat Issue Tracker

Type: Bug
Resolution: Unresolved
Priority: Major
Fix Version/s: CNV v4.19.0
Affects Version/s: None
Component/s: CNV Virtualization
Labels:
None

Story Points:
0.42
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Component Fix Version(s):
None
[QE] How to address?:
---
[QE] Why QE missed?:
---
Market:

Regression:
No

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

If the migration failed for any reason (for example when target POD is non-schedulable and stuck in Pending state) - the new migration initiated automatically. This process repeats indefinitely in a loop.

There are no options to cancel CPU hotplug process. When trying to revert cpu values in VM back - getting this error:

 * spec.template.spec.domain.cpu.sockets: cannot update CPU sockets while another CPU change is in progress


The only workaround found - restart the VM, however it sounds not correct (with hotplug we don't want to restart the VM)

Version-Release number of selected component (if applicable):

4.15

How reproducible:

100%

Steps to Reproduce:

1. Create and run VM with node selector
2. increase cpu on VM
VMIM object created automatically but the new pod (target pod) in Pending state because no any place where it can run
When the target pod in Pending state for 5 minutes - VMIM marked as failed and new VMIM created with the same result - again the target POD in Pending state

Actual results:

There are no options to cancel Hotplug process or revert changes

Expected results:

Hotplug process can be canceled manually or probably automatically (after some timeout or when migration failed)

Additional info:

Migration can be blocked by multiple reasons like nodeSelector or lack of resourses on other nodes..

With implementing ApplicationAwareQuota the problem may get worse because the target pod can also be blocked by reaching the quota.

Potentially it may lead to blocking the cluster upgrade because the VM can't be evicted from the node.

is related to

CNV-45703 CPU hotplug pending for restart

Assignee:: Stuart Gott

Reporter:: Denys Shchedrivyi

QA Contact:: Kedar Bidarkar

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2024/04/15 6:15 PM

Updated:: 2025/01/24 10:55 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide