Loading...

XML

Word

Printable

Type: Bug
Resolution: Not a Bug
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:
- cnv-netsdn-wg

Activity Type:
None
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Story Points:
1
Original story points:
1
Severity:
Critical
Epic Link:
CORENET-6564

Target Version:
None
Release Blocker:
None
Sprint:
CORENET Sprint 284

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

PX Impact Score:

Release Note Status:
None
Release Note Type:
None
Release Note Text:
None

After live migrating one VM to a different node, traffic to/from it becomes permanently broken.

Reproduce the issue with the following manifests:

---
apiVersion: k8s.ovn.org/v1
kind: VTEP
metadata:
  name: evpn-vtep
spec:
  mode: Unmanaged
  cidrs:
    - 192.168.122.0/24 # must adjust this to match the node subnet
---
apiVersion: v1
kind: Namespace
metadata:
  name: evpn-demo
  labels:
    network: evpn-demo
    k8s.ovn.org/primary-user-defined-network: evpn-l2
---
apiVersion: k8s.ovn.org/v1
kind: ClusterUserDefinedNetwork
metadata:
  name: evpn-l2
  labels:
    evpn: "true"
spec:
  namespaceSelector:
    matchLabels:
      network: evpn-demo
  network:
    topology: Layer2
    transport: EVPN
    layer2:
      role: Primary
      subnets:
        - 10.200.0.0/16
      ipam:
        lifecycle: Persistent # VMs need this
    evpn:
      vtep: evpn-vtep
      macVRF:
        vni: 20100
        routeTarget: "65000:20100"
      ipVRF:
        vni: 20101
        routeTarget: "65000:20101"
---
apiVersion: k8s.ovn.org/v1
kind: RouteAdvertisements
metadata:
  name: evpn-routes
spec:
  nodeSelector: {}
  frrConfigurationSelector: {}
  networkSelectors:
    - networkSelectionType: ClusterUserDefinedNetworks
      clusterUserDefinedNetworkSelector:
        networkSelector:
          matchLabels:
            evpn: "true"
  targetVRF: auto
  advertisements:
    - PodNetwork

Now provision the workloads; pods:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nettools
  namespace: evpn-demo
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nettools
  template:
    metadata:
      labels:
        app: nettools
    spec:
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - labelSelector:
                matchLabels:
                  app: nettools
              topologyKey: kubernetes.io/hostname
      containers:
        - name: nettools
          image: docker.io/nicolaka/netshoot:v0.13
          command: ["sleep", "infinity"]

And VM:

---
apiVersion: kubevirt.io/v1
kind: VirtualMachine
metadata:
  labels:
    kubevirt.io/vm: vm1
  name: vm1
  namespace: evpn-demo 
spec:
  runStrategy: Always
  template:
    metadata:
      name: vm1
      namespace: evpn-demo
      labels:
        app.kubernetes.io/name: evpn-demo
    spec:
      domain:
        devices:
          disks:
          - disk:
              bus: virtio
            name: containerdisk
          - disk:
              bus: virtio
            name: cloudinitdisk
          interfaces:
          - name: evpn
            binding:
              name: l2bridge
          rng: {}
        resources:
          requests:
            memory: 2048M
      networks:
      - pod: {}
        name: evpn
      terminationGracePeriodSeconds: 0
      volumes:
      - containerDisk:
          image: quay.io/kubevirt/fedora-with-test-tooling-container-disk:v1.7.0
        name: containerdisk
      - cloudInitNoCloud:
          userData: |-
            #cloud-config
            password: fedora
            chpasswd: { expire: False }
        name: cloudinitdisk

Check the IP address of the vm, start a ping from one of the pods to the VM.

Then issue the migrate command:

virtctl migrate -nevpn-demo vm1

Traffic will stop, and the following errors will be seen in the pod:

From 10.200.0.6 icmp_seq=988 Destination Host Unreachable
From 10.200.0.6 icmp_seq=989 Destination Host Unreachable
From 10.200.0.6 icmp_seq=990 Destination Host Unreachable
From 10.200.0.6 icmp_seq=991 Destination Host Unreachable
From 10.200.0.6 icmp_seq=992 Destination Host Unreachable

The FDB tables on the worker nodes where the VM originated from, or moved to are:

Before migration:

# SRC node (where the VM is running)
sh-5.1# bridge fdb | grep 0a:58:0a:c8:00:0d
0a:58:0a:c8:00:0d dev evpn-evpn-l2 vlan 4 master evbr-evpn-vtep 

# DST node
sh-5.1# bridge fdb | grep 0a:58:0a:c8:00:0d
0a:58:0a:c8:00:0d dev evx4-evpn-vtep vlan 4 extern_learn master evbr-evpn-vtep 
0a:58:0a:c8:00:0d dev evx4-evpn-vtep dst 192.168.122.37 src_vni 20100 self extern_learn

After migration:

# SRC node
sh-5.1# bridge fdb | grep 0a:58:0a:c8:00:0d 0a:58:0a:c8:00:0d dev evpn-evpn-l2 vlan 4 master evbr-evpn-vtep
  
# DST node (where the VM is running)
sh-5.1# bridge fdb | grep 0a:58:0a:c8:00:0d 0a:58:0a:c8:00:0d dev evx4-evpn-vtep vlan 4 extern_learn master evbr-evpn-vtep  0a:58:0a:c8:00:0d dev evx4-evpn-vtep dst 192.168.122.37 src_vni 20100 self extern_learn

After a while, the FDB entries time out, and are garbage collected.

relates to

CORENET-6853 kubevirt traffic disrupted during live migration with EVPN at target LSP

Code Review

Assignee:: Unassigned

Reporter:: Miguel Duarte de Mora Barroso

Need Info From:: None

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Created:: 2026/02/19 4:44 PM

Updated:: 2026/02/26 11:45 AM

Resolved:: 2026/02/26 11:45 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty