-
Bug
-
Resolution: Done
-
Normal
-
None
-
Quality / Stability / Reliability
-
False
-
False
-
CLOSED
-
Undefined
-
Important
-
None
Created attachment 1795082 [details]
migration_vmb_new.yaml
Created attachment 1795082 [details]
migration_vmb_new.yaml
Description of problem:
a migrated vm takes a lot of time (between 10 to 60 seconds) to gain connectivity.
when pinging over a secondary interface from the migrated vm (with multus) to another vm (with multus) in the same cluster, there is a packet loss (with 'Destination Host Unreachable') during this period of time.
Version-Release number of selected component (if applicable):
CNV v.4.8.0
OCP v.4.8.0-fc.5
Kubernetes Version: v1.21.0-rc.0+88a3e8c
How reproducible:
Not always. I couldn't find a correlation to understand why.
Steps to Reproduce:
1. create a dedicated namespace for the resources that will be created in the next steps. Name it "anat-test-migration" to match the namespace defined in the files attached.
2. create bridge (use 'migration_nncp_1.yaml' and 'migration_nncp_2.yaml' files attached - make sure to change the node selector to match your cluster nodes)
3. create nad (use 'migration_nad_new.yaml' file attached)
4. create vma and vmb (use 'migration_vma_new.yaml' and 'migration_vmb_new.yaml' files attached).
5. run both VM's:
$ virtctl start vma
$ virtctl start vmb
6. expose services to allow ssh connection to both vms (use 'migration_ssh_service_for_vma.yaml' and 'migration_ssh_service_for_vmb.yaml' files attached).
7. migrate vmb (use 'migration_virtualmachineinstancemigration.yaml' file attached).
8. connect to vmb as soon as the migration finishes. To find the exact moment you can check when the vmi is assigned a new IP address using the command:
$ oc get vmi -w
9. ping from vmb to vma over the secondary interface (bridge):
- enter VM vmb through ssh (the IP is the ip of the node on which vmb is running, '-p' is the port of the vmb's service which can be found using the command 'oc get service'):
$ ssh fedora@192.168.2.83 -p 30401 - ping vma:
$ ping 10.200.0.1
- in order to reproduce, steps 8 and 9 should be performed as close to the migration ending as possible.
Actual results:
when bug occurs:
[fedora@vmb ~]$ ping 10.200.0.1
PING 10.200.0.1 (10.200.0.1) 56(84) bytes of data.
From 10.200.0.22 icmp_seq=10 Destination Host Unreachable
From 10.200.0.22 icmp_seq=11 Destination Host Unreachable
From 10.200.0.22 icmp_seq=12 Destination Host Unreachable
From 10.200.0.22 icmp_seq=13 Destination Host Unreachable
From 10.200.0.22 icmp_seq=14 Destination Host Unreachable
From 10.200.0.22 icmp_seq=15 Destination Host Unreachable
From 10.200.0.22 icmp_seq=16 Destination Host Unreachable
From 10.200.0.22 icmp_seq=17 Destination Host Unreachable
From 10.200.0.22 icmp_seq=18 Destination Host Unreachable
From 10.200.0.22 icmp_seq=19 Destination Host Unreachable
From 10.200.0.22 icmp_seq=20 Destination Host Unreachable
From 10.200.0.22 icmp_seq=21 Destination Host Unreachable
From 10.200.0.22 icmp_seq=22 Destination Host Unreachable
From 10.200.0.22 icmp_seq=23 Destination Host Unreachable
From 10.200.0.22 icmp_seq=24 Destination Host Unreachable
From 10.200.0.22 icmp_seq=25 Destination Host Unreachable
From 10.200.0.22 icmp_seq=26 Destination Host Unreachable
From 10.200.0.22 icmp_seq=27 Destination Host Unreachable
From 10.200.0.22 icmp_seq=28 Destination Host Unreachable
From 10.200.0.22 icmp_seq=29 Destination Host Unreachable
From 10.200.0.22 icmp_seq=30 Destination Host Unreachable
From 10.200.0.22 icmp_seq=31 Destination Host Unreachable
From 10.200.0.22 icmp_seq=32 Destination Host Unreachable
From 10.200.0.22 icmp_seq=33 Destination Host Unreachable
64 bytes from 10.200.0.1: icmp_seq=35 ttl=64 time=3.93 ms
64 bytes from 10.200.0.1: icmp_seq=34 ttl=64 time=1028 ms
64 bytes from 10.200.0.1: icmp_seq=36 ttl=64 time=1.36 ms
64 bytes from 10.200.0.1: icmp_seq=37 ttl=64 time=0.962 ms
64 bytes from 10.200.0.1: icmp_seq=38 ttl=64 time=1.30 ms
^C
— 10.200.0.1 ping statistics —
38 packets transmitted, 5 received, +24 errors, 86.8421% packet loss, time 37808ms
rtt min/avg/max/mdev = 0.962/207.169/1028.296/410.564 ms, pipe 4
Expected results:
no packet loss.
Additional info:
tcpdump of the secondary interface of the migrated VM (vmb) is included - steps to produce:
1. ssh to vmb:
$ ssh fedora@192.168.2.83 -p 30401
2. run tcpdump:
$ sudo tcpdump -i eth1 -xx >~/tcpdump_log.log
- external trackers