-
Bug
-
Resolution: Done-Errata
-
Normal
-
rhel-9.4
-
libvirt-10.8.0-1.el9
-
None
-
Moderate
-
ZStream
-
rhel-virt-core-libvirt-1
-
ssg_virtualization
-
11
-
5
-
QE ack, Dev ack
-
False
-
False
-
-
None
-
Red Hat OpenShift Virtualization
-
None
-
Approved Blocker
-
Pass
-
Automated
-
-
10.7.0
-
None
What were you trying to do that didn't work?
Recover postcopy returned with error '"job 'migration in' failed in post-copy phase' but the migration was recovered successfully in fact.
Please provide the package NVR for which bug is seen:
libvirt-10.0.0-1.el9.x86_64
qemu-kvm-8.2.0-2.el9.x86_64
How reproducible:
100%
Steps to reproduce
1. Monitor events on both source and target host:
# virsh event --loop --all
2.Do migration:
# virsh migrate vm2 qemu+ssh://X.X.X.com/system --persistent --verbose --live --p2p --bandwidth 5 --postcopy
3. Switch to postcopy before migration completes:
# virsh migrate-postcopy vm2
4.Abort postcopy migraiton before migration completes:
# virsh domjobabort vm2 --postcopy
error: Requested operation is not valid: domain is not running
5.Recover postcopy migraiton:
# virsh migrate vm2 qemu+ssh://X.X.X.com/system --persistent --verbose --live --p2p --bandwidth 5 --postcopy --postcopy-resume
error: operation failed: job 'migration in' failed in post-copy phase
6.Try Recover postcopy again:
# virsh migrate vm2 qemu+ssh://X.X.X.com/system --persistent --verbose --live --p2p --bandwidth 5 --postcopy --postcopy-resume
error: Requested operation is not valid: QEMU reports migration is still running
7.After a few minutes, the guest was running on target host and shutdown on the source guest
8. Check the events on both source and target host:
Source host:
# virsh event --all --loop
event 'migration-iteration' for domain 'vm2': iteration: '1'
event 'lifecycle' for domain 'vm2': Suspended Migrated
event 'lifecycle' for domain 'vm2': Suspended Post-copy
event 'migration-iteration' for domain 'vm2': iteration: '2'
event 'lifecycle' for domain 'vm2': Suspended Post-copy Error
*event 'lifecycle' for domain 'vm2': Suspended Post-copy*
*event 'lifecycle' for domain 'vm2': Suspended Post-copy Error*
event 'lifecycle' for domain 'vm2': Stopped Migrated
event 'job-completed' for domain 'vm2':
operation: 5
time_elapsed: 21752
downtime: 184
setup_time: 160
data_total: 2160803840
data_processed: 45301213
data_remaining: 2112503808
memory_total: 2160803840
memory_processed: 45301213
memory_remaining: 2112503808
memory_bps: 5257170
memory_constant: 3014
memory_normal: 8832
memory_normal_bytes: 36175872
memory_dirty_rate: 7
memory_iteration: 2
memory_postcopy_requests: 0
memory_page_size: 4096
disk_total: 0
disk_processed: 0
disk_remaining: 0
target host:
# virsh event --all --loop
event 'agent-lifecycle' for domain 'vm2': state: 'disconnected' reason: 'domain started'
event 'lifecycle' for domain 'vm2': Started Migrated
event 'agent-lifecycle' for domain 'vm2': state: 'disconnected' reason: 'channel event'
event 'lifecycle' for domain 'vm2': Defined Updated
event 'lifecycle' for domain 'vm2': Resumed Post-copy
event 'agent-lifecycle' for domain 'vm2': state: 'connected' reason: 'channel event'
event 'lifecycle' for domain 'vm2': Resumed Post-copy Error
event 'lifecycle' for domain 'vm2': Resumed Post-copy
event 'lifecycle' for domain 'vm2': Resumed Migrated
Expected results
Should not report error if recover successfully.
Actual results
Recover postcopy returned with error '"job 'migration in' failed in post-copy phase' but the migration was recovered successfully in fact.
Additional info:
1.It's a regression issue and can not reproduce with libvirt-9.10.0-1.el9.x86_64.
2.Error info in libvirtd log:
# cat /var/log/libvirt/libvirtd.log | grep -i error
2024-01-19 07:56:08.323+0000: 87228: error : virNetClientProgramDispatchError:170 : operation failed: job 'migration in' failed in post-copy phase
2024-01-19 07:56:08.335+0000: 87228: debug : virNetServerProgramSendError:147 : prog=536903814 ver=1 proc=305 type=1 serial=11 msg=0x55eac5a72f60 rerr=0x7f487ddfa9a0
2024-01-19 07:56:14.740+0000: 89328: debug : qemuMonitorJSONIOProcessLine:191 : Line [{"return": {"status": "postcopy-paused", "setup-time": 171, "error-desc": "Postcopy migration is paused by the user", "downtime": 192, "total-time": 10816, "ram": {"total": 2160803840, "postcopy-requests": 260, "dirty-sync-count": 2, "multifd-bytes": 0, "pages-per-second": 28666, "downtime-bytes": 0, "page-size": 4096, "remaining": 101707776, "postcopy-bytes": 97200224, "mbps": 940.56274285714289, "transferred": 114720833, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 16828994, "duplicate": 475968, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 109531136, "normal": 26741}}, "id": "libvirt-463"}]
2024-01-19 07:56:14.740+0000: 89328: info : qemuMonitorJSONIOProcessLine:210 : QEMU_MONITOR_RECV_REPLY: mon=0x7f486804ed00 reply=\{"return": {"status": "postcopy-paused", "setup-time": 171, "error-desc": "Postcopy migration is paused by the user", "downtime": 192, "total-time": 10816, "ram": {"total": 2160803840, "postcopy-requests": 260, "dirty-sync-count": 2, "multifd-bytes": 0, "pages-per-second": 28666, "downtime-bytes": 0, "page-size": 4096, "remaining": 101707776, "postcopy-bytes": 97200224, "mbps": 940.56274285714289, "transferred": 114720833, "dirty-sync-missed-zero-copy": 0, "precopy-bytes": 16828994, "duplicate": 475968, "dirty-pages-rate": 0, "skipped": 0, "normal-bytes": 109531136, "normal": 26741}}, "id": "libvirt-463"}
2024-01-19 07:56:14.766+0000: 87226: error : virNetClientProgramDispatchError:170 : operation failed: job 'migration in' failed in post-copy phase
2024-01-19 07:56:14.778+0000: 87226: debug : virNetServerProgramSendError:147 : prog=536903814 ver=1 proc=305 type=1 serial=11 msg=0x55eac5a6d390 rerr=0x7f487edfc9a0
- is blocked by
-
RHEL-38485 Failure to resume paused post-copy migration is undetectable
-
- Closed
-
-
RHEL-50574 Rebase libvirt in RHEL-9.6.0
-
- Closed
-
- links to
-
RHBA-2024:140248
libvirt update