-
Bug
-
Resolution: Unresolved
-
Undefined
-
rhel-9.6
-
libvirt-10.10.0-4.el9
-
No
-
Important
-
rhel-sst-virtualization
-
ssg_virtualization
-
3
-
Dev ack
-
False
-
-
None
-
None
-
Pass
-
None
-
-
x86_64
-
None
What were you trying to do that didn't work?
Libvirt report post copy migration resume failed but actually success in some cases
What is the impact of this issue to you?
Libvirt report an error when resume a post copy migration in some cases
Please provide the package NVR for which the bug is seen:
libvirt-10.10.0-3.el9.x86_64
qemu-kvm-9.1.0-5.el9.x86_64
How reproducible is this bug?:
100% in automation environment
Steps to reproduce
1. prepare a migration environment and an active vm # virsh start vm1 Domain 'vm1' started 2. perform postcopy migration and abort migration Terminal1: # virsh migrate --live --p2p --verbose --domain vm1 --desturi qemu+tcp://target_host/system --postcopy --bandwidth 5 --tls Terminal2: virsh # migrate-postcopy vm1 virsh # domjobabort vm1 --postcopy 3. resume postcopy migration # virsh migrate --live --p2p --verbose --domain vm1 --desturi qemu+tcp://target_host/system --postcopy --bandwidth 5 --tls --postcopy-resume error: operation failed: job 'migration in' failed in post-copy phase 4. After a few minutes, the guest was running on target host and shutdown on the source guest # virsh list --all Id Name State ---------------------- 2 vm1 running 5. check target host virtqemud log and can find that post-copy resume actually success
Expected results
Should not report error if recover successfully.
Actual results
Recover postcopy returned with error '"job 'migration in' failed in post-copy phase' but the migration was recovered successfully in fact.
Additional info
This issue not happened on every test environment. I compared the different between the "good" and "bad" environment and found that looks like in the "bad" environment qemu sent postcopy-recover event later than the "good" environment.