-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
rhel-9.4.z
-
No
-
None
-
rhel-virt-core-libvirt-1
-
None
-
False
-
False
-
-
None
-
None
-
None
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
-
All
-
None
What were you trying to do that didn't work?
Customer is testing migrations by creating and deleting them in a loop. It appears that cancelling a migration at a very specific point in time - the switchover phase at the end of precopy, where the VM is briefly suspended to copy the remaining pages - the system will lose track of the migration: pretending to cancel it (the VMIM is deleted and the logs indicate it cancelled), but the libvirt job is left "running" indefinitely, blocking further migrations.
What is the impact of this issue to you?
Please provide the package NVR for which the bug is seen:
How reproducible is this bug?:
Takes a while but reproduces, may need to tune the sleep to allow for destination pod preparation
$ cat migrate.sh
#!/bin/bash
while true; do
virtctl migrate rhel-94
sleep 7
virtctl migrate-cancel rhel-94
done
Steps to reproduce
Create and delete migrations in a loop
Expected results
* Clean up and also cancel the libvirt job
Actual results:
- Migration is left running on libvirt side
- Cannot migrate again
Additional info:
1. Migration initiated {"component":"virt-launcher","kind":"","level":"info","msg":"Initiating live migration.","name":"fio-vm-multi-disk","namespace":"kubevirt-0-1749496987","pos":"live-migration-source.go:1003","timestamp":"2025-06-09T19:51:37.945455Z","uid":"332aa27b-b3f3-4f5a-93bd-0f12fef37995"} {"component":"virt-launcher","kind":"","level":"info","msg":"generated migration parameters: [....] 2. Immediatelly cancel: {"component":"virt-launcher","kind":"","level":"info","msg":"Live migration as been aborted","name":"fio-vm-multi-disk","namespace":"kubevirt-0-1749496987","pos":"server.go:136","timestamp":"2025-06-09T19:51:40.071392Z","uid":"332aa27b-b3f3-4f5a-93bd-0f12fef37995"} {"component":"virt-launcher","kind":"","level":"info","msg":"Live migration as been aborted","name":"fio-vm-multi-disk","namespace":"kubevirt-0-1749496987","pos":"server.go:136","timestamp":"2025-06-09T19:51:40.077091Z","uid":"332aa27b-b3f3-4f5a-93bd-0f12fef37995"} 3. Event comes that we are into switchover phase {"component":"virt-launcher","level":"info","msg":"DomainLifecycle event Domain event=\"suspended\" detail=\"migrated\" with event id 3 reason 1 received","pos":"client.go:481","timestamp":"2025-06-09T19:51:40.170293Z"} 4. Migration is aborted after entering switchover {"component":"virt-launcher","kind":"","level":"info","msg":"Live migration abort succeeded","name":"fio-vm-multi-disk","namespace":"kubevirt-0-1749496987","pos":"live-migration-source.go:718","timestamp":"2025-06-09T19:51:40.173127Z","uid":"332aa27b-b3f3-4f5a-93bd-0f12fef37995"} 5. Resume the VM again on source {"component":"virt-launcher","level":"info","msg":"DomainLifecycle event Domain event=\"resumed\" detail=\"unpaused\" with event id 4 reason 0 received","pos":"client.go:481","timestamp":"2025-06-09T19:51:40.173268Z"} {"component":"virt-launcher","level":"info","msg":"Domain name event: kubevirt-0-1749496987_fio-vm-multi-disk","pos":"client.go:435","timestamp":"2025-06-09T19:51:40.176666Z"} {"component":"virt-launcher","level":"info","msg":"Domain name event: kubevirt-0-1749496987_fio-vm-multi-disk","pos":"client.go:435","timestamp":"2025-06-09T19:51:40.181104Z"} 6. Why are we getting migration statistics if it was cancelled? {"component":"virt-launcher","kind":"","level":"info","msg":"Migration info for 2854a6d7-c569-431e-bb45-cb41f66caa3d: TimeElapsed:4003ms DataProcessed:1MiB DataRemaining:0MiB DataTotal:1MiB MemoryProcessed:0MiB MemoryRemaining:0MiB MemoryTotal:0MiB MemoryBandwidth:0Mbps DirtyRate:0Mbps Iteration:0 PostcopyRequests:0 ConstantPages:0 NormalPages:0 NormalData:0MiB ExpectedDowntime:0ms DiskMbps:0","name":"fio-vm-multi-disk","namespace":"kubevirt-0-1749496987","pos":"live-migration-source.go:685","timestamp":"2025-06-09T19:51:41.962866Z","uid":"332aa27b-b3f3-4f5a-93bd-0f12fef37995"} ... {"component":"virt-launcher","kind":"","level":"info","msg":"Migration info for 2854a6d7-c569-431e-bb45-cb41f66caa3d: TimeElapsed:747689ms DataProcessed:1MiB DataRemaining:0MiB DataTotal:1MiB MemoryProcessed:0MiB MemoryRemaining:0MiB MemoryTotal:0MiB MemoryBandwidth:0Mbps DirtyRate:0Mbps Iteration:0 PostcopyRequests:0 ConstantPages:0 NormalPages:0 NormalData:0MiB ExpectedDowntime:0ms DiskMbps:0","name":"fio-vm-multi-disk","namespace":"kubevirt-0-1749496987","pos":"live-migration-source.go:685","timestamp":"2025-06-09T20:04:05.649309Z","uid":"332aa27b-b3f3-4f5a-93bd-0f12fef37995"} 7. Times out on virt-launcher side (but wasn't it cancelled) {"component":"virt-launcher","kind":"","level":"error","msg":"Live migration abort detected with reason: Live migration is not completed after 750 seconds and has been aborted","name":"fio-vm-multi-disk","namespace":"kubevirt-0-1749496987","pos":"live-migration-source.go:649","timestamp":"2025-06-09T20:04:09.270034Z","uid":"332aa27b-b3f3-4f5a-93bd-0f12fef37995"} ... 2 days later ... 8. Try again, to find out the libvirt job was not cancelled on the "abort" above. {"component":"virt-launcher","kind":"","level":"error","msg":"Live migration failed.","name":"fio-vm-multi-disk","namespace":"kubevirt-0-1749496987","pos":"live-migration-source.go:1016","reason":"error encountered during MigrateToURI3 libvirt api call: virError(Code=55, Domain=10, Message='Requested operation is not valid: another migration job is already running for domain 'kubevirt-0-1749496987_fio-vm-multi-disk'')","timestamp":"2025-06-11T12:12:43.508933Z","uid":"332aa27b-b3f3-4f5a-93bd-0f12fef37995"}
- is depended on by
-
CNV-63721 Lost track of migration when cancelled during pre-copy "switchover" phase
-
- New
-