-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
rhel-9.8
-
None
-
None
-
Moderate
-
1
-
rhel-virt-core-libvirt-1
-
5
-
False
-
False
-
-
None
-
Libvirt Bugs already in Sprint
-
None
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
-
All
-
None
What were you trying to do that didn't work?
Start a VM locally that stopped (destroyed) after switching to postcopy.
What is the impact of this issue to you?
Low, test case fails but the VM can be started eventually.
Please provide the package NVR for which the bug is seen:
libvirt-11.10.0-3.el9
How reproducible is this bug?:
100%
Steps to reproduce
- Set up shared storage live migration
- Start migration
virsh migrate --live --p2p --verbose --domain avocado-vt-vm1 --desturi qemu+tcp://10.0.160.70/system --bandwidth 10 --postcopy-bandwidth 10 --postcopy
- Switch to postcopy, wait shortly
and destroy the VM
virsh migrate-postcopy avocado-vt-vm1; sleep 0.5; virsh destroy avocado-vt-vm1; virsh start avocado-vt-vm1
- Try to start the VM
virsh start avocado-vt-vm1
Expected results
The VM can be started
Actual results
The VM can't be started immediately although virsh list confirms its shut off. But it will become 'runnable' after a while without further intervention.
error: Failed to start domain 'avocado-vt-vm1' error: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainMigratePerform3Params) # virsh list --all Id Name State --------------------------------- - avocado-vt-vm1 shut off # virhs start avocado-vt-vm1 -bash: virhs: command not found # virsh start avocado-vt-vm1 error: Failed to start domain 'avocado-vt-vm1' error: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainMigratePerform3Params) # virsh start avocado-vt-vm1 Domain 'avocado-vt-vm1' started
Additional information
- Caught by test case migration.async_ops.destroy_vm_during_finishphase.destroy_src_vm.with_postcopy.p2p on both x86_64 and s390x
- As a result of destroying the VM, the Migration stops, this can happen in different ways:
- 'domain X not running' in stderror
- no specific error message in stderror, migrate returns with 1 and stderror is just a cut of list of "Migration x %" messages
- job 'migration in' failed in post-copy phase
- The test case currently only considers the last of the above error exits
- The wait time of 0.5 is really important; not waiting and waiting for 1 second didnt' reproduce the issue for me