-
Bug
-
Resolution: Unresolved
-
Normal
-
rhel-9.4
-
libvirt-10.8.0-1.el9
-
None
-
Moderate
-
ZStream
-
rhel-sst-virtualization
-
ssg_virtualization
-
11
-
5
-
QE ack, Dev ack
-
False
-
-
None
-
Red Hat OpenShift Virtualization
-
None
-
Approved Blocker
-
Pass
-
Automated
-
-
10.7.0
-
None
What were you trying to do that didn't work?
Recover postcopy returned with error '"job 'migration in' failed in post-copy phase' but the migration was recovered successfully in fact.
Please provide the package NVR for which bug is seen:
libvirt-10.0.0-1.el9.x86_64
qemu-kvm-8.2.0-2.el9.x86_64
How reproducible:
100%
Steps to reproduce
1. Monitor events on both source and target host: # virsh event --loop --all 2.Do migration: # virsh migrate vm2 qemu+ssh://X.X.X.com/system --persistent --verbose --live --p2p --bandwidth 5 --postcopy 3. Switch to postcopy before migration completes: # virsh migrate-postcopy vm2 4.Abort postcopy migraiton before migration completes: # virsh domjobabort vm2 --postcopy error: Requested operation is not valid: domain is not running 5.Recover postcopy migraiton: # virsh migrate vm2 qemu+ssh://X.X.X.com/system --persistent --verbose --live --p2p --bandwidth 5 --postcopy --postcopy-resume error: operation failed: job 'migration in' failed in post-copy phase 6.Try Recover postcopy again: # virsh migrate vm2 qemu+ssh://X.X.X.com/system --persistent --verbose --live --p2p --bandwidth 5 --postcopy --postcopy-resume error: Requested operation is not valid: QEMU reports migration is still running 7.After a few minutes, the guest was running on target host and shutdown on the source guest 8. Check the events on both source and target host: Source host: # virsh event --all --loop event 'migration-iteration' for domain 'vm2': iteration: '1' event 'lifecycle' for domain 'vm2': Suspended Migrated event 'lifecycle' for domain 'vm2': Suspended Post-copy event 'migration-iteration' for domain 'vm2': iteration: '2' event 'lifecycle' for domain 'vm2': Suspended Post-copy Error *event 'lifecycle' for domain 'vm2': Suspended Post-copy* *event 'lifecycle' for domain 'vm2': Suspended Post-copy Error* event 'lifecycle' for domain 'vm2': Stopped Migrated event 'job-completed' for domain 'vm2': operation: 5 time_elapsed: 21752 downtime: 184 setup_time: 160 data_total: 2160803840 data_processed: 45301213 data_remaining: 2112503808 memory_total: 2160803840 memory_processed: 45301213 memory_remaining: 2112503808 memory_bps: 5257170 memory_constant: 3014 memory_normal: 8832 memory_normal_bytes: 36175872 memory_dirty_rate: 7 memory_iteration: 2 memory_postcopy_requests: 0 memory_page_size: 4096 disk_total: 0 disk_processed: 0 disk_remaining: 0 target host: # virsh event --all --loop event 'agent-lifecycle' for domain 'vm2': state: 'disconnected' reason: 'domain started' event 'lifecycle' for domain 'vm2': Started Migrated event 'agent-lifecycle' for domain 'vm2': state: 'disconnected' reason: 'channel event' event 'lifecycle' for domain 'vm2': Defined Updated event 'lifecycle' for domain 'vm2': Resumed Post-copy event 'agent-lifecycle' for domain 'vm2': state: 'connected' reason: 'channel event' event 'lifecycle' for domain 'vm2': Resumed Post-copy Error event 'lifecycle' for domain 'vm2': Resumed Post-copy event 'lifecycle' for domain 'vm2': Resumed Migrated
Expected results
Should not report error if recover successfully.
Actual results
Recover postcopy returned with error '"job 'migration in' failed in post-copy phase' but the migration was recovered successfully in fact.
Additional info:
1.It's a regression issue and can not reproduce with libvirt-9.10.0-1.el9.x86_64.
2.Error info in libvirtd log:
# cat /var/log/libvirt/libvirtd.log | grep -i error 2024-01-19 07:56:08.323+0000: 87228: error : virNetClientProgramDispatchError:170 : operation failed: job 'migration in' failed in post-copy phase 2024-01-19 07:56:08.335+0000: 87228: debug : virNetServerProgramSendError:147 : prog=536903814 ver=1 proc=305 type=1 serial=11 msg=0x55eac5a72f60 rerr=0x7f487ddfa9a0 2024-01-19 07:56:14.740+0000: 89328: debug : qemuMonitorJSONIOProcessLine:191 : Line [{"return": {"status": "postcopy-paused", "setup-time": 171, "error-desc": "Postcopy migration is paused by the user", "downtime": 192, "total-time": 10816, "ram": {"total": 2160803840, "postcopy-requests": 260, "dirty-sync-count": 2, "multifd-bytes": 0, "pages-per-second": 28666, "downtime-bytes": 0, "page-size": 4096, "remaining": 101707776, "postcopy-bytes": 97200224, "mbps": 940.56274285714289, "transferred": 114720833, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 16828994, "duplicate": 475968, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 109531136, "normal": 26741}}, "id": "libvirt-463"}] 2024-01-19 07:56:14.740+0000: 89328: info : qemuMonitorJSONIOProcessLine:210 : QEMU_MONITOR_RECV_REPLY: mon=0x7f486804ed00 reply=\{"return": {"status": "postcopy-paused", "setup-time": 171, "error-desc": "Postcopy migration is paused by the user", "downtime": 192, "total-time": 10816, "ram": {"total": 2160803840, "postcopy-requests": 260, "dirty-sync-count": 2, "multifd-bytes": 0, "pages-per-second": 28666, "downtime-bytes": 0, "page-size": 4096, "remaining": 101707776, "postcopy-bytes": 97200224, "mbps": 940.56274285714289, "transferred": 114720833, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 16828994, "duplicate": 475968, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 109531136, "normal": 26741}}, "id": "libvirt-463"} 2024-01-19 07:56:14.766+0000: 87226: error : virNetClientProgramDispatchError:170 : operation failed: job 'migration in' failed in post-copy phase 2024-01-19 07:56:14.778+0000: 87226: debug : virNetServerProgramSendError:147 : prog=536903814 ver=1 proc=305 type=1 serial=11 msg=0x55eac5a6d390 rerr=0x7f487edfc9a0
- is blocked by
-
RHEL-50574 Rebase libvirt in RHEL-9.6.0
- In Progress
-
RHEL-38485 Failure to resume paused post-copy migration is undetectable
- Release Pending
- links to
-
RHBA-2024:140248 libvirt update