Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-61718

OpenShift Virtualization: Migration failure on CPU hotplug with network multiqueues

XMLWordPrintable

    • Quality / Stability / Reliability
    • 0.42
    • False
    • Hide

      None

      Show
      None
    • False
    • rhel-virt-tools
    • None

      What were you trying to do that didn't work?

      Currently, On OpenShift Virtualization, the scenario the "CPU hotplug when network multiqueue is enabled" scenario is blocked for users, since it fails due to the following error:

      ```

      'Live migration failed error encountered during MigrateToURI3 libvirt api call: virError(Code=38, Domain=0, Message=''Unable to create multiple fds
      for tap device tap0 (maybe existing device was created without multi_queue flag): Invalid argument'')'

      ```

      What is the impact of this issue to you?

      Please provide the package NVR for which the bug is seen:

      How reproducible is this bug?:

      Steps to reproduce

      *The limitation should be removed from KubeVirt's code by reverting https://github.com/kubevirt/kubevirt/pull/12180

      A VM is started with one socket, one core, one thread one vNIC and networkMultiqueue: true.

      KubeVirt creates the tap device with the following flags: IFF_TUN_EXCL | IFF_ONE_QUEUE [1]

       
      The interface definition is:
      ```
          <interface type='ethernet'>
            <mac address='9a:21:fe:67:62:d9'/>
            <target dev='tap0' managed='no'/>
            <model type='virtio-non-transitional'/>
            <driver name='vhost'/>
            <mtu size='1480'/>
            <alias name='ua-default'/>
            <rom enabled='no'/>
            <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
          </interface>
      ```
      The number of sockets in the VM spec is then increased to two.

      KubeVirt automatically starts a migration.

      The tap device in the target pod is created with the following flags: TUNTAP_MULTI_QUEUE | TUNTAP_NO_PI [2][3]

      The domain migrates from source to target without changing the interface definition.
      virtqemud on the target pod fails the migration with the following error:

      'Live migration failed error encountered during MigrateToURI3 libvirt api call: virError(Code=38, Domain=0, Message=''Unable to create multiple fds
      for tap device tap0 (maybe existing device was created without multi_queue flag): Invalid argument'')'

       

      [1] https://github.com/kubevirt/kubevirt/blob/0fcaf6fba79375363c5f6dfa36554cf2fd375978/cmd/virt-chroot/tap-device-maker.go#L29
      [2] https://github.com/vishvananda/netlink/blob/65a253d3751cff6da8274035e4548974a874f477/link_linux.go#L1425
      [3] https://github.com/vishvananda/netlink/blob/65a253d3751cff6da8274035e4548974a874f477/link_linux.go#L34

      Expected results

      The migration should succeed.

      Actual results

      The migration fails.

        1. domain.txt
          10 kB
          Orel Misan
        2. failed_mig_source
          120 kB
          Orel Misan
        3. failed_mig_target
          8 kB
          Orel Misan
        4. tun.c
          1 kB
          Han Han
        5. vm_spec.yaml
          1.0 kB
          Orel Misan
        6. vm_spec2.yaml
          1.0 kB
          Orel Misan

              phoracek@redhat.com Petr Horacek
              omisan@redhat.com Orel Misan
              Yossi Segev Yossi Segev
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

                Created:
                Updated:
                Resolved: