Loading...

XML

Word

Printable

Type: Bug
Resolution: Done-Errata
Priority: Major
Fix Version/s: CNV v4.18.0
Affects Version/s: None
Component/s: CNV Network
Labels:
- cnv-netsdn-wg
- phoracek-gc

Activity Type:
Incidents & Support
Story Points:
5
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
cnv-cund-localnet
Component Fix Version(s):
None
Market:

Regression:
No

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:

I've seen this weird behaviour a few times in my cluster, but I wasn't sure if it was me or not doing some wrong configuration.

Yesterday I had it with a customer in a remote session, and it became very clear the localnet configurations are not being propagated correctly, old values get stuck and don't change on the datapath when the current NAD/NNCP declarations are changed.

It only starts working as per NAD/NNCP specification after a reboot of the node, sometimes a reboot of the entire cluster, including control plan.

The below example with VLAN is just a simple way to reproduce it, my impression is this is not a subset of configurations but a bigger problem. In my cluster, if I configure a localnet incorrectly for the first time but then correct the configurations later it never works, and need a reboot before the VMs can come up at all (fail with timeout on CNI Add) and/or be in the same network.

Version-Release number of selected component (if applicable):

4.15.2

How reproducible:

Always

Steps to Reproduce:

1. Setup a localnet as follows, nothing special

$ oc get nncp mapping -o json | jq ' .spec' 
{
  "desiredState": {
    "ovn": {
      "bridge-mappings": [
        {
          "bridge": "br-ex",
          "localnet": "localnet-bridge",
          "state": "present"
        }
      ]
    }
  }
}

The NAD is configured to tag the frames with VID 2

$ oc get network-attachment-definition virt -o json | jq ' .spec' 
{
  "config": "{\n        \"cniVersion\": \"0.3.1\", \n        \"name\": \"localnet-bridge\", \n        \"type\": \"ovn-k8s-cni-overlay\",\n        \"topology\": \"localnet\",\n        \"vlanID\": 2, \n        \"netAttachDefName\": \"homelab/virt\"\n}\n"
}

2. Bring the VM UP using this NAD and start a ping to 8.8.8.8 (or any other address)

3. Tcpdump at the physical interface of the physical host

$ tcpdump -i enp4s0 -nnn -e icmp
03:15:19.646961 02:52:59:00:00:11 > 26:5a:4c:5d:3d:7c, ethertype 802.1Q (0x8100), length 102: vlan 2, p 0, ethertype IPv4 (0x0800), 192.168.2.5 > 8.8.8.8: ICMP echo request, id 1, seq 243, length 64
03:15:19.667768 26:5a:4c:5d:3d:7c > 02:52:59:00:00:11, ethertype 802.1Q (0x8100), length 102: vlan 2, p 0, ethertype IPv4 (0x0800), 8.8.8.8 > 192.168.2.5: ICMP echo reply, id 1, seq 243, length 64

NOTE: frames are tagged with VLAN ID 2, as per NAD

4. Shutdown the VM

5. Change the NAD to VID 222 or any other number

$ oc get network-attachment-definition virt -o json | jq ' .spec' -S
{
  "config": "{\n        \"cniVersion\": \"0.3.1\", \n        \"name\": \"localnet-bridge\", \n        \"type\": \"ovn-k8s-cni-overlay\",\n        \"topology\": \"localnet\",\n        \"vlanID\": 222, \n        \"netAttachDefName\": \"homelab/virt\"\n}\n"
}

6. Start the VM

7. From the inside the VM, start the ping again

8. Look at the frames going out:

Its still on the original VLAN

03:22:18.329029 26:5a:4c:5d:3d:7c > 02:52:59:00:00:11, ethertype 802.1Q (0x8100), length 102: vlan 2, p 0, ethertype IPv4 (0x0800), 8.8.8.8 > 192.168.2.5: ICMP echo reply, id 1, seq 1, length 64
03:22:19.331071 26:5a:4c:5d:3d:7c > 02:52:59:00:00:11, ethertype 802.1Q (0x8100), length 102: vlan 2, p 0, ethertype IPv4 (0x0800), 8.8.8.8 > 192.168.2.5: ICMP echo reply, id 1, seq 2, length 64

9. The configuration is only really in effect once I reboot the node, sometimes I have to reboot the entire cluster (including control plane):

Actual results:

Configuration declared is not applied, need to reboot entire cluster for some things to work

Expected results:

All configuration declared is applied

Additional info:

links to

[KCS] Virtual Machine with Localnet configuration stuck in Starting state, or running with outdated network settings

RHEA-2024:139653 OpenShift Virtualization 4.18.0 Images

Assignee:: Yossi Segev

Reporter:: Germano Veit Michel

QA Contact:: Yossi Segev

Votes:: 0 Vote for this issue

Watchers:: 8 Start watching this issue

Created:: 2024/05/24 3:33 AM

Updated:: 2025/09/13 2:26 PM

Resolved:: 2025/02/25 3:13 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates