-
Bug
-
Resolution: Done-Errata
-
Major
-
None
-
False
-
-
False
-
CLOSED
-
---
-
---
-
High
-
None
Description of problem:
When live migrating a VM with several SR-IOV network interfaces, some of the NICs fail to be attached in the target host with this error:
Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainMigratePrepare3Params)
It looks like virt-launcher tries to hot-plug the host-devices in a stage too early, when the lock is held by remoteDispatchDomainMigratePrepare3Params where domain modification is not allowed.
Version-Release number of selected component (if applicable):
OpenShift 4.10.23
kubevirt-hyperconverged-operator.v4.10.3
sriov-network-operator.4.10.0-202207192148
How reproducible:
Always in customer environment.
Steps to Reproduce:
1. Have a VMI with several SR-IOV NICs:
~~~
interfaces:
- bridge: {}
macAddress: aa:bb:cc:dd:ee:00
model: virtio
name: nic-1 - macAddress: aa:bb:cc:dd:ee:01
model: virtio
name: nic-2
pciAddress: "0000:20:00.0"
sriov: {} - macAddress: aa:bb:cc:dd:ee:02
model: virtio
name: nic-3
pciAddress: "0000:21:00.0"
sriov: {} - macAddress: aa:bb:cc:dd:ee:03
model: virtio
name: nic-4
pciAddress: "0000:22:00.0"
sriov: {} - macAddress: aa:bb:cc:dd:ee:04
model: virtio
name: nic-5
pciAddress: "0000:23:00.0"
sriov: {} - macAddress: aa:bb:cc:dd:ee:05
model: virtio
name: nic-6
pciAddress: "0000:24:00.0"
sriov: {} - macAddress: aa:bb:cc:dd:ee:06
model: virtio
name: nic-7
pciAddress: "0000:25:00.0"
sriov: {}
~~~
2. Live migrate the VM
3. After the migration, verify if the VM has all the NICs connected and check the virt-launcher pod log
Actual results:
Some hot-plug operations fail:
~~~
{"component":"virt-launcher","level":"info","msg":"Successfully hot-plug host-device: sriov-nic-3 (\u0026
)","pos":"hotplug.go:166","timestamp":"2022-09-06T13:12:08.284556Z"}
{"component":"virt-launcher","level":"info","msg":"Successfully hot-plug host-device: sriov-nic-4 (\u0026
)","pos":"hotplug.go:166","timestamp":"2022-09-06T13:12:08.581104Z"}
{"component":"virt-launcher","level":"info","msg":"Successfully hot-plug host-device: sriov-nic-5 (\u0026
)","pos":"hotplug.go:166","timestamp":"2022-09-06T13:12:08.827619Z"}
{"component":"virt-launcher","level":"info","msg":"Successfully hot-plug host-device: sriov-nic-6 (\u0026
)","pos":"hotplug.go:166","timestamp":"2022-09-06T13:12:09.102456Z"}
{"component":"virt-launcher","level":"info","msg":"Successfully hot-plug host-device: sriov-nic-7 (\u0026
)","pos":"hotplug.go:166","timestamp":"2022-09-06T13:12:09.404308Z"}
{"component":"virt-launcher","kind":"","level":"error","msg":"failed to hot-plug host-devices","name":"vm-01","namespace":"test-ns","pos":"live-migration-target.go:42","reason":"failed to attach host-device \u003chostdev type=\"pci\" managed=\"no\"\u003e\u003csource\u003e\u003caddress type=\"pci\" domain=\"0x0000\" bus=\"0x60\" slot=\"0x0e\" function=\"0x4\"\u003e\u003c/address\u003e\u003c/source\u003e\u003caddress type=\"pci\" domain=\"0x0000\" bus=\"0x20\" slot=\"0x00\" function=\"0x0\"\u003e\u003c/address\u003e\u003calias name=\"ua-sriov-nic-2\"\u003e\u003c/alias\u003e\u003c/hostdev\u003e, err: virError(Code=68, Domain=10, Message='Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainMigratePrepare3Params)')\n","timestamp":"2022-09-06T13:12:09.404356Z","uid":"74afee88-afa9-494f-8a9a-fe004033bfd0"}~~~
Expected results:
NICs attached successfully
Additional info:
There are some recent changes in how the SR-IOV devices are attached:
https://github.com/kubevirt/kubevirt/pull/6581
Can they be backported to 4.10?
- external trackers