Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-28040

[2186372] Packet drops during the initial phase of VM live migration

XMLWordPrintable

    • Important
    • No

      Description of problem:

      When a virtual machine is getting live migrated the packet drops are observed on the inbound traffic to the VM immediately after the target virt-launcher starts. These packets are getting routed to the destination node when the migration is still running.

      Test: Started a ping to the VM during the VM migration from an external machine and the tcpdump was collected on both the source and the destination worker node and on the client machine.

      IP address of the VM: 10.74.130.192
      MAC: 02:6a:85:00:00:21

      ~~~

      1. ping -i 0.5 10.74.130.192

      64 bytes from 10.74.130.192: icmp_seq=11 ttl=64 time=0.375 ms
      64 bytes from 10.74.130.192: icmp_seq=12 ttl=64 time=0.624 ms
      64 bytes from 10.74.130.192: icmp_seq=13 ttl=64 time=0.299 ms
      64 bytes from 10.74.130.192: icmp_seq=14 ttl=64 time=63.5 ms

      < — drops -->

      64 bytes from 10.74.130.192: icmp_seq=83 ttl=64 time=415 ms
      64 bytes from 10.74.130.192: icmp_seq=84 ttl=64 time=11.9 ms
      ~~~

      The lost packets in the client packet capture.

      ~~~

      1. TZ=UTC tshark -nr client.pcap -t ad icmp
        ....
        ....
        49 2023-04-12 04:26:11.881169 10.74.130.192 → 10.74.128.144 98 Echo (ping) reply id=0xa732, seq=13/3328, ttl=64 (request in 48)
        50 2023-04-12 04:26:12.380186 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=14/3584, ttl=64
        51 2023-04-12 04:26:12.443677 10.74.130.192 → 10.74.128.144 98 Echo (ping) reply id=0xa732, seq=14/3584, ttl=64 (request in 50)
        54 2023-04-12 04:26:12.880854 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=15/3840, ttl=64
        55 2023-04-12 04:26:13.380357 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=16/4096, ttl=64
        58 2023-04-12 04:26:13.881358 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=17/4352, ttl=64
        59 2023-04-12 04:26:14.380506 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=18/4608, ttl=64
        61 2023-04-12 04:26:14.880871 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=19/4864, ttl=64
        62 2023-04-12 04:26:15.380386 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=20/5120, ttl=64
        63 2023-04-12 04:26:15.880623 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=21/5376, ttl=64
        .......
        .......
        .......
        127 2023-04-12 04:26:46.402744 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=82/20992, ttl=64
        129 2023-04-12 04:26:47.316301 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=83/21248, ttl=64
        130 2023-04-12 04:26:47.318223 10.74.130.192 → 10.74.128.144 98 Echo (ping) reply id=0xa732, seq=83/21248, ttl=64 (request in 129)
        131 2023-04-12 04:26:47.402238 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=84/21504, ttl=64
        132 2023-04-12 04:26:47.414150 10.74.130.192 → 10.74.128.144 98 Echo (ping) reply id=0xa732, seq=84/21504, ttl=64 (request in 131)
        ~~~

      These packets actually reached the destination node although the migration was still running:

      ~~~
      The destination node, seq 15 - 82 reached here:

      1. TZ=UTC tshark -nr worker1_dst.pcap -t ad icmp
        3 2023-04-12 04:26:12.878671 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=15/3840, ttl=64
        4 2023-04-12 04:26:13.378150 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=16/4096, ttl=64
        7 2023-04-12 04:26:13.879223 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=17/4352, ttl=64
        8 2023-04-12 04:26:14.378311 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=18/4608, ttl=64
        10 2023-04-12 04:26:14.878612 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=19/4864, ttl=64
        11 2023-04-12 04:26:15.378144 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=20/5120, ttl=64
        12 2023-04-12 04:26:15.878365 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=21/5376, ttl=64
        13 2023-04-12 04:26:16.378102 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=22/5632, ttl=64
        ....
        ....
        ....

      Source Node, seq 15 - 82 missing

      48 2023-04-12 04:26:11.884612 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=13/3328, ttl=64
      49 2023-04-12 04:26:11.884731 10.74.130.192 → 10.74.128.144 98 Echo (ping) reply id=0xa732, seq=13/3328, ttl=64 (request in 48)
      50 2023-04-12 04:26:12.389793 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=14/3584, ttl=64
      51 2023-04-12 04:26:12.447208 10.74.130.192 → 10.74.128.144 98 Echo (ping) reply id=0xa732, seq=14/3584, ttl=64 (request in 50)

      < seq 15 - 82 missing >

      57 2023-04-12 04:26:47.320115 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=83/21248, ttl=64
      58 2023-04-12 04:26:47.320579 10.74.130.192 → 10.74.128.144 98 Echo (ping) reply id=0xa732, seq=83/21248, ttl=64 (request in 57)
      59 2023-04-12 04:26:47.413168 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=84/21504, ttl=64
      60 2023-04-12 04:26:47.416380 10.74.130.192 → 10.74.128.144 98 Echo (ping) reply id=0xa732, seq=84/21504, ttl=64 (request in 59)
      61 2023-04-12 04:26:47.907042 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=85/21760, ttl=64
      62 2023-04-12 04:26:47.907207 10.74.130.192 → 10.74.128.144 98 Echo (ping) reply id=0xa732, seq=85/21760, ttl=64 (request in 61)
      63 2023-04-12 04:26:48.407203 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=86/22016, ttl=64

      ~~~

      The domain was getting migrated during this time, and hence the destination VM was in paused status:

      ~~~
      oc logs virt-launcher-rhel8-d58yi5fym85626yq-h76wk |grep "kubevirt domain status"

      {"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Paused(3):StartingUp(11)","pos":"client.go:289","timestamp":"2023-04-12T04:26:15.582630Z"} {"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Paused(3):Migration(2)","pos":"client.go:289","timestamp":"2023-04-12T04:26:16.244198Z"} {"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Paused(3):Migration(2)","pos":"client.go:289","timestamp":"2023-04-12T04:28:30.799153Z"} {"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Paused(3):Migration(2)","pos":"client.go:289","timestamp":"2023-04-12T04:28:30.832917Z"}

      < — Migration completed -->

      {"component":"virt-launcher","level":"info","msg":"kubevirt domain status: Running(1):Unknown(2)","pos":"client.go:289","timestamp":"2023-04-12T04:28:30.883757Z"}

      ~~~

      It looks like the client is getting confused during the migration and routing traffic to the destination node while the migration is still going on because of the below ipv6 multicast packets originating from 02:6a:85:00:00:21( MAC address of the VM interface).

      ~~~
      48 2023-04-12 04:26:11.880892 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=13/3328, ttl=64
      49 2023-04-12 04:26:11.881169 10.74.130.192 → 10.74.128.144 98 Echo (ping) reply id=0xa732, seq=13/3328, ttl=64 (request in 48)
      50 2023-04-12 04:26:12.380186 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=14/3584, ttl=64
      51 2023-04-12 04:26:12.443677 10.74.130.192 → 10.74.128.144 98 Echo (ping) reply id=0xa732, seq=14/3584, ttl=64 (request in 50)
      52 2023-04-12 04:26:12.470232 :: → ff02::16 90 Multicast Listener Report Message v2 <<<
      53 2023-04-12 04:26:12.782278 :: → ff02::1:ff00:21 86 Neighbor Solicitation for fe80::6a:85ff:fe00:21 <<<
      54 2023-04-12 04:26:12.880854 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=15/3840, ttl=64 <<< ping routed to dest node
      55 2023-04-12 04:26:13.380357 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=16/4096, ttl=64
      56 2023-04-12 04:26:13.798396 fe80::6a:85ff:fe00:21 → ff02::16 90 Multicast Listener Report Message v2
      57 2023-04-12 04:26:13.798452 fe80::6a:85ff:fe00:21 → ff02::2 70 Router Solicitation from 02:6a:85:00:00:21
      58 2023-04-12 04:26:13.881358 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=17/4352, ttl=64
      59 2023-04-12 04:26:14.380506 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=18/4608, ttl=64
      60 2023-04-12 04:26:14.390271 :: → ff02::1:ff00:21 86 Neighbor Solicitation for 2620:52:0:4a80:6a:85ff:fe00:21
      61 2023-04-12 04:26:14.880871 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=19/4864, ttl=64
      62 2023-04-12 04:26:15.380386 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=20/5120, ttl=64
      63 2023-04-12 04:26:15.880623 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=21/5376, ttl=64

      Packet 52:

      52 2023-04-12 09:56:12.470232 :: ff02::16 90 Multicast Listener Report Message v252 2023-04-12 09:56:12.470232 :: ff02::16 90 Multicast Listener Report Message v2
      Frame 52: 90 bytes on wire (720 bits), 90 bytes captured (720 bits)
      Ethernet II, Src: 02:6a:85:00:00:21 (02:6a:85:00:00:21), Dst: IPv6mcast_16 (33:33:00:00:00:16) <<<
      Internet Protocol Version 6, Src: ::, Dst: ff02::16
      Internet Control Message Protocol v6

      Packet 53:

      53 2023-04-12 09:56:12.782278 :: ff02::1:ff00:21 86 Neighbor Solicitation for fe80::6a:85ff:fe00:21
      Frame 53: 86 bytes on wire (688 bits), 86 bytes captured (688 bits)
      Ethernet II, Src: 02:6a:85:00:00:21 (02:6a:85:00:00:21), Dst: IPv6mcast_ff:00:00:21 (33:33:ff:00:00:21)
      Internet Protocol Version 6, Src: ::, Dst: ff02::1:ff00:21
      Internet Control Message Protocol v6
      ~~~

      And these ipv6 multicast packets are originating from the destination virt-launcher pod and it seems to be when the virt-launcher pod does the ipv6 neighbor discovery. virt-launcher pod will have the VM's MAC before it creates the bridge and pass it to the VM.

      ~~~
      net1 is having 02:6a:85:00:00:21 before it creates the bridge.

      1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
      link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
      inet 127.0.0.1/8 scope host lo
      valid_lft forever preferred_lft forever
      inet6 ::1/128 scope host
      valid_lft forever preferred_lft forever
      3: eth0@if198: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP group default
      link/ether 0a:58:0a:83:00:9a brd ff:ff:ff:ff:ff:ff link-netnsid 0
      inet 10.131.0.154/23 brd 10.131.1.255 scope global eth0
      valid_lft forever preferred_lft forever
      inet6 fe80::d023:79ff:fe49:79d/64 scope link
      valid_lft forever preferred_lft forever
      4: net1@if199: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default <<<<
      link/ether 02:6a:85:00:00:21 brd ff:ff:ff:ff:ff:ff link-netnsid 0 <<<<
      inet6 fe80::6a:85ff:fe00:21/64 scope link tentative
      valid_lft forever preferred_lft forever
      ~~~

      The packets are routed to the destination until the client does an ARP discovery again:

      ~~~
      TZ=UTC tshark -nr client.pcap -t ad

      125 2023-04-12 04:26:45.611473 18:66:da:9f:b3:b9 → 02:6a:85:00:00:21 42 Who has 10.74.130.192? Tell 10.74.128.144
      126 2023-04-12 04:26:45.903198 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=81/20736, ttl=64
      127 2023-04-12 04:26:46.402744 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=82/20992, ttl=64
      128 2023-04-12 04:26:47.316291 02:6a:85:00:00:21 → 18:66:da:9f:b3:b9 60 10.74.130.192 is at 02:6a:85:00:00:21
      129 2023-04-12 04:26:47.316301 10.74.128.144 → 10.74.130.192 98 Echo (ping) request id=0xa732, seq=83/21248, ttl=64
      130 2023-04-12 04:26:47.318223 10.74.130.192 → 10.74.128.144 98 Echo (ping) reply id=0xa732, seq=83/21248, ttl=64 (request in 129)
      ~~~

      Once the migration completes, we can see RARP from the destination node as expected and then it routes the packets to the destination node:

      ~~~
      TZ=UTC tshark -nr client.pcap -t ad

      77 2023-04-12 04:28:30.835510 02:6a:85:00:00:21 → ff:ff:ff:ff:ff:ff 60 Who is 02:6a:85:00:00:21? Tell 02:6a:85:00:00:21
      78 2023-04-12 04:28:30.835539 02:6a:85:00:00:21 → ff:ff:ff:ff:ff:ff 60 Who is 02:6a:85:00:00:21? Tell 02:6a:85:00:00:21
      79 2023-04-12 04:28:30.835553 02:6a:85:00:00:21 → ff:ff:ff:ff:ff:ff 60 Who is 02:6a:85:00:00:21? Tell 02:6a:85:00:00:21
      80 2023-04-12 04:28:30.848542 02:6a:85:00:00:21 → 18:66:da:9f:b3:b9 42 Who has 10.74.128.144? Tell 10.74.130.192
      81 2023-04-12 04:28:30.848743 18:66:da:9f:b3:b9 → 02:6a:85:00:00:21 60 10.74.128.144 is at 18:66:da:9f:b3:b9
      83 2023-04-12 04:28:30.851173 10.74.130.192 → 10.74.128.144 98 Echo (ping) reply id=0xa732, seq=21/5376, ttl=64 (request in 12)
      84 2023-04-12 04:28:30.851352 10.74.130.192 → 10.74.128.144 98 Echo (ping) reply id=0xa732, seq=22/5632, ttl=64 (request in 13)
      85 2023-04-12 04:28:30.851454 10.74.130.192 → 10.74.128.144 98 Echo (ping) reply id=0xa732, seq=23/5888, ttl=64 (request in 14)
      86 2023-04-12 04:28:30.851541 10.74.130.192 → 10.74.128.144 98 Echo (ping) reply id=0xa732, seq=24/6144, ttl=64 (request in 15)
      ~~~

      Version-Release number of selected component (if applicable):

      OpenShift Virtualization 4.12.0

      How reproducible:

      100%

      Steps to Reproduce:

      1. Start a ping to the VM during the VM migration. Use -i in ping to shorten the interval between packets.
      2. We can observe packet drops for a few seconds when the destination virt-launcher starts.

      Actual results:

      Packet drops during the initial phase of VM live migration. This drop is in addition to the drops during the final stage of migration where the source qemu is paused to move the last remaining memory pages to the destination qemu. So the user will experience more network downtime than other platforms which like RHV.

      Expected results:

      Additional info:

        1. vma.yaml
          2 kB
          Yossi Segev
        2. vmb.yaml
          2 kB
          Yossi Segev

              ysegev@redhat.com Yossi Segev
              rhn-support-nashok Nijin Ashok
              Nir Rozen Nir Rozen
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

                Created:
                Updated:
                Resolved: