Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-21197

[2126106] Failed to attach SR-IOV network interfaces when live migrating a VM

XMLWordPrintable

    • High
    • None

      Description of problem:
      When live migrating a VM with several SR-IOV network interfaces, some of the NICs fail to be attached in the target host with this error:

      Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainMigratePrepare3Params)

      It looks like virt-launcher tries to hot-plug the host-devices in a stage too early, when the lock is held by remoteDispatchDomainMigratePrepare3Params where domain modification is not allowed.

      Version-Release number of selected component (if applicable):
      OpenShift 4.10.23
      kubevirt-hyperconverged-operator.v4.10.3
      sriov-network-operator.4.10.0-202207192148

      How reproducible:
      Always in customer environment.

      Steps to Reproduce:
      1. Have a VMI with several SR-IOV NICs:

      ~~~
      interfaces:

      • bridge: {}
        macAddress: aa:bb:cc:dd:ee:00
        model: virtio
        name: nic-1
      • macAddress: aa:bb:cc:dd:ee:01
        model: virtio
        name: nic-2
        pciAddress: "0000:20:00.0"
        sriov: {}
      • macAddress: aa:bb:cc:dd:ee:02
        model: virtio
        name: nic-3
        pciAddress: "0000:21:00.0"
        sriov: {}
      • macAddress: aa:bb:cc:dd:ee:03
        model: virtio
        name: nic-4
        pciAddress: "0000:22:00.0"
        sriov: {}
      • macAddress: aa:bb:cc:dd:ee:04
        model: virtio
        name: nic-5
        pciAddress: "0000:23:00.0"
        sriov: {}
      • macAddress: aa:bb:cc:dd:ee:05
        model: virtio
        name: nic-6
        pciAddress: "0000:24:00.0"
        sriov: {}
      • macAddress: aa:bb:cc:dd:ee:06
        model: virtio
        name: nic-7
        pciAddress: "0000:25:00.0"
        sriov: {}
        ~~~

      2. Live migrate the VM
      3. After the migration, verify if the VM has all the NICs connected and check the virt-launcher pod log

      Actual results:
      Some hot-plug operations fail:

      ~~~
      {"component":"virt-launcher","level":"info","msg":"Successfully hot-plug host-device: sriov-nic-3 (\u0026

      {pci 0x0000 0x60 0x12 0x1 }

      )","pos":"hotplug.go:166","timestamp":"2022-09-06T13:12:08.284556Z"}
      {"component":"virt-launcher","level":"info","msg":"Successfully hot-plug host-device: sriov-nic-4 (\u0026

      {pci 0x0000 0x60 0x09 0x1 }

      )","pos":"hotplug.go:166","timestamp":"2022-09-06T13:12:08.581104Z"}
      {"component":"virt-launcher","level":"info","msg":"Successfully hot-plug host-device: sriov-nic-5 (\u0026

      {pci 0x0000 0x60 0x19 0x5 }

      )","pos":"hotplug.go:166","timestamp":"2022-09-06T13:12:08.827619Z"}
      {"component":"virt-launcher","level":"info","msg":"Successfully hot-plug host-device: sriov-nic-6 (\u0026

      {pci 0x0000 0x60 0x0e 0x2 }

      )","pos":"hotplug.go:166","timestamp":"2022-09-06T13:12:09.102456Z"}
      {"component":"virt-launcher","level":"info","msg":"Successfully hot-plug host-device: sriov-nic-7 (\u0026

      {pci 0x0000 0x60 0x11 0x2 }

      )","pos":"hotplug.go:166","timestamp":"2022-09-06T13:12:09.404308Z"}

      {"component":"virt-launcher","kind":"","level":"error","msg":"failed to hot-plug host-devices","name":"vm-01","namespace":"test-ns","pos":"live-migration-target.go:42","reason":"failed to attach host-device \u003chostdev type=\"pci\" managed=\"no\"\u003e\u003csource\u003e\u003caddress type=\"pci\" domain=\"0x0000\" bus=\"0x60\" slot=\"0x0e\" function=\"0x4\"\u003e\u003c/address\u003e\u003c/source\u003e\u003caddress type=\"pci\" domain=\"0x0000\" bus=\"0x20\" slot=\"0x00\" function=\"0x0\"\u003e\u003c/address\u003e\u003calias name=\"ua-sriov-nic-2\"\u003e\u003c/alias\u003e\u003c/hostdev\u003e, err: virError(Code=68, Domain=10, Message='Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainMigratePrepare3Params)')\n","timestamp":"2022-09-06T13:12:09.404356Z","uid":"74afee88-afa9-494f-8a9a-fe004033bfd0"}

      ~~~

      Expected results:
      NICs attached successfully

      Additional info:
      There are some recent changes in how the SR-IOV devices are attached:

      https://github.com/kubevirt/kubevirt/pull/6581

      Can they be backported to 4.10?

              phoracek@redhat.com Petr Horacek
              rhn-support-jortialc Juan Orti
              Nir Rozen Nir Rozen
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: