Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-7336

ping can not always work during live migration of vm with failover VF

    • None
    • Moderate
    • rhel-sst-virtualization-networking
    • ssg_virtualization
    • 3
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Known Issue
    • Hide
      .Host network cannot ping VMs with VFs during live migration

      When live migrating a virtual machine (VM) with a configured virtual function (VF), such as a VMs that uses virtual SR-IOV software, the network of the VM is not visible to other devices and the VM cannot be reached by commands such as `ping`. After the migration is finished, however, the problem no longer occurs.
      Show
      .Host network cannot ping VMs with VFs during live migration When live migrating a virtual machine (VM) with a configured virtual function (VF), such as a VMs that uses virtual SR-IOV software, the network of the VM is not visible to other devices and the VM cannot be reached by commands such as `ping`. After the migration is finished, however, the problem no longer occurs.
    • Done
    • None

      Description of problem:
      During the live migration of vm with VF, I can not get ping reply from guest immediately after the VF is hot-unplugged.(the vm didn't reach downtime at this time).

      Version-Release number of selected component (if applicable):
      Host:
      4.18.0-147.3.1.el8_1.x86_64
      qemu-kvm-4.1.0-20.module+el8.1.1+5309+6d656f05.x86_64
      Guest:
      4.18.0-147.3.1.el8_1.x86_64

      How reproducible:
      10/10

      Steps to Reproduce:
      1.On source host,create 82599ES VF and set the mac address of the VF
      ip link set enp6s0f0 vf 0 mac 22:2b:62:bb:a9:82

      2.start a source guest with 82599ES VF which enables failover
      /usr/libexec/qemu-kvm -name rhel811 -M q35 -enable-kvm \
      -monitor stdio \
      -nodefaults \
      -m 4G \
      -boot menu=on \
      -cpu Haswell-noTSX-IBRS \
      -device pcie-root-port,id=root.1,chassis=1,addr=0x2.0,multifunction=on \
      -device pcie-root-port,id=root.2,chassis=2,addr=0x2.1 \
      -device pcie-root-port,id=root.3,chassis=3,addr=0x2.2 \
      -device pcie-root-port,id=root.4,chassis=4,addr=0x2.3 \
      -device pcie-root-port,id=root.5,chassis=5,addr=0x2.4 \
      -device pcie-root-port,id=root.6,chassis=6,addr=0x2.5 \
      -device pcie-root-port,id=root.7,chassis=7,addr=0x2.6 \
      -device pcie-root-port,id=root.8,chassis=8,addr=0x2.7 \
      -smp 2,sockets=1,cores=2,threads=2,maxcpus=4 \
      -qmp tcp:0:6666,server,nowait \
      -blockdev node-name=back_image,driver=file,cache.direct=on,cache.no-flush=off,filename=/nfsmount/migra_test/rhel811_q35.qcow2,aio=threads \
      -blockdev node-name=drive-virtio-disk0,driver=qcow2,cache.direct=on,cache.no-flush=off,file=back_image \
      -device virtio-blk-pci,drive=drive-virtio-disk0,id=disk0,bus=root.1 \
      -device VGA,id=video1,bus=root.2 \
      -vnc :0 \
      -netdev tap,id=hostnet0,vhost=on \
      -device virtio-net-pci,netdev=hostnet0,id=net0,mac=22:2b:62:bb:a9:82,bus=root.3,failover=on \
      -device vfio-pci,host=0000:06:10.0,id=hostdev0,bus=root.4,failover_pair_id=net0 \

      3.check the network info in source guest

      1. ifconfig
        enp3s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
        inet 10.73.33.236 netmask 255.255.254.0 broadcast 10.73.33.255
        ether 22:2b:62:bb:a9:82 txqueuelen 1000 (Ethernet)
        RX packets 28683 bytes 1961744 (1.8 MiB)
        RX errors 0 dropped 0 overruns 0 frame 0
        TX packets 93 bytes 13770 (13.4 KiB)
        TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

      enp3s0nsby: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
      ether 22:2b:62:bb:a9:82 txqueuelen 1000 (Ethernet)
      RX packets 28345 bytes 1924974 (1.8 MiB)
      RX errors 0 dropped 0 overruns 0 frame 0
      TX packets 0 bytes 0 (0.0 B)
      TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

      enp4s0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
      ether 22:2b:62:bb:a9:82 txqueuelen 1000 (Ethernet)
      RX packets 339 bytes 36836 (35.9 KiB)
      RX errors 0 dropped 0 overruns 0 frame 0
      TX packets 95 bytes 14406 (14.0 KiB)
      TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

      lo: flags=73<UP,LOOPBACK,RUNNING> mtu 65536
      inet 127.0.0.1 netmask 255.0.0.0
      inet6 ::1 prefixlen 128 scopeid 0x10<host>
      loop txqueuelen 1000 (Local Loopback)
      RX packets 0 bytes 0 (0.0 B)
      RX errors 0 dropped 0 overruns 0 frame 0
      TX packets 0 bytes 0 (0.0 B)
      TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0

      1. ip link show
        1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
        link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
        2: enp3s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000
        link/ether 22:2b:62:bb:a9:82 brd ff:ff:ff:ff:ff:ff
        3: enp3s0nsby: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel master enp3s0 state UP mode DEFAULT group default qlen 1000
        link/ether 22:2b:62:bb:a9:82 brd ff:ff:ff:ff:ff:ff
        4: enp4s0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master enp3s0 state UP mode DEFAULT group default qlen 1000
        link/ether 22:2b:62:bb:a9:82 brd ff:ff:ff:ff:ff:ff

      4.On target host,create NetXtreme BCM57810 VF and set the mac address of the VF
      ip link set enp131s0f0 vf 0 mac 22:2b:62:bb:a9:82

      5.start a target guest in listening mode in order to wait for migrating from source guest
      ...
      -incoming tcp:0:5800 \

      6.keep pinging the vm during the migration

      1. ping 10.73.33.236

      7.Migrate guest from source host to target host.
      (qemu) migrate -d tcp:10.73.73.73:5800
      migrate guest successfully.

      8.check ping output

      1. ping 10.73.33.236
        64 bytes from 10.73.33.236: icmp_seq=59 ttl=61 time=3.07 ms
        64 bytes from 10.73.33.236: icmp_seq=60 ttl=61 time=4.35 ms
        64 bytes from 10.73.33.236: icmp_seq=61 ttl=61 time=2.10 ms
        64 bytes from 10.73.33.236: icmp_seq=62 ttl=61 time=4.53 ms[1]
        64 bytes from 10.73.33.236: icmp_seq=88 ttl=61 time=7.39 ms[2]
        64 bytes from 10.73.33.236: icmp_seq=89 ttl=61 time=4.35 ms
        64 bytes from 10.73.33.236: icmp_seq=90 ttl=61 time=5.82 ms
        64 bytes from 10.73.33.236: icmp_seq=91 ttl=61 time=4.39 ms

      [1]
      when "virtio_net virtio1 enp3s0: failover primary slave:enp4s0 unregistered" is outputed in source guest vm dmesg,ping will not work until the migration is completed.
      [2]
      when migration is completed,ping works again.

      Actual results:
      when "virtio_net virtio1 enp3s0: failover primary slave:enp4s0 unregistered" is outputed in source guest vm dmesg,ping will not work until the migration is completed.

      Expected results:
      ping should always work during migration, because hypervisor will fail over to the virtio netdev datapath when the VF is unplugged.

      Additional info:
      (1)

      1. lspci | grep -i 82599
        06:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
        06:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit SFI/SFP+ Network Connection (rev 01)
        (2)
        This problem can be reproduced with NetXtreme II BCM57810
        (3)
        This problem can be reproduced in RHEL82-AV
        The test env info is as following:
        host:
        qemu-kvm-4.2.0-4.module+el8.2.0+5220+e82621dc.x86_64
        4.18.0-167.el8.x86_64
        guest:
        4.18.0-167.el8.x86_64

              lvivier@redhat.com Laurent Vivier
              yanghliu@redhat.com YangHang Liu
              Laurent Vivier Laurent Vivier
              Yanhui Ma Yanhui Ma
              Jiří Herrmann Jiří Herrmann
              Votes:
              0 Vote for this issue
              Watchers:
              16 Start watching this issue

                Created:
                Updated:
                Resolved: