-
Bug
-
Resolution: Done-Errata
-
Major
-
None
Description of problem:
I've seen this weird behaviour a few times in my cluster, but I wasn't sure if it was me or not doing some wrong configuration. Yesterday I had it with a customer in a remote session, and it became very clear the localnet configurations are not being propagated correctly, old values get stuck and don't change on the datapath when the current NAD/NNCP declarations are changed. It only starts working as per NAD/NNCP specification after a reboot of the node, sometimes a reboot of the entire cluster, including control plan. The below example with VLAN is just a simple way to reproduce it, my impression is this is not a subset of configurations but a bigger problem. In my cluster, if I configure a localnet incorrectly for the first time but then correct the configurations later it never works, and need a reboot before the VMs can come up at all (fail with timeout on CNI Add) and/or be in the same network.
Version-Release number of selected component (if applicable):
4.15.2
How reproducible:
Always
Steps to Reproduce:
1. Setup a localnet as follows, nothing special
$ oc get nncp mapping -o json | jq ' .spec' { "desiredState": { "ovn": { "bridge-mappings": [ { "bridge": "br-ex", "localnet": "localnet-bridge", "state": "present" } ] } } }
The NAD is configured to tag the frames with VID 2
$ oc get network-attachment-definition virt -o json | jq ' .spec' { "config": "{\n \"cniVersion\": \"0.3.1\", \n \"name\": \"localnet-bridge\", \n \"type\": \"ovn-k8s-cni-overlay\",\n \"topology\": \"localnet\",\n \"vlanID\": 2, \n \"netAttachDefName\": \"homelab/virt\"\n}\n" }
2. Bring the VM UP using this NAD and start a ping to 8.8.8.8 (or any other address)
3. Tcpdump at the physical interface of the physical host
$ tcpdump -i enp4s0 -nnn -e icmp 03:15:19.646961 02:52:59:00:00:11 > 26:5a:4c:5d:3d:7c, ethertype 802.1Q (0x8100), length 102: vlan 2, p 0, ethertype IPv4 (0x0800), 192.168.2.5 > 8.8.8.8: ICMP echo request, id 1, seq 243, length 64 03:15:19.667768 26:5a:4c:5d:3d:7c > 02:52:59:00:00:11, ethertype 802.1Q (0x8100), length 102: vlan 2, p 0, ethertype IPv4 (0x0800), 8.8.8.8 > 192.168.2.5: ICMP echo reply, id 1, seq 243, length 64
NOTE: frames are tagged with VLAN ID 2, as per NAD
4. Shutdown the VM
5. Change the NAD to VID 222 or any other number
$ oc get network-attachment-definition virt -o json | jq ' .spec' -S { "config": "{\n \"cniVersion\": \"0.3.1\", \n \"name\": \"localnet-bridge\", \n \"type\": \"ovn-k8s-cni-overlay\",\n \"topology\": \"localnet\",\n \"vlanID\": 222, \n \"netAttachDefName\": \"homelab/virt\"\n}\n" }
6. Start the VM
7. From the inside the VM, start the ping again
8. Look at the frames going out:
Its still on the original VLAN
03:22:18.329029 26:5a:4c:5d:3d:7c > 02:52:59:00:00:11, ethertype 802.1Q (0x8100), length 102: vlan 2, p 0, ethertype IPv4 (0x0800), 8.8.8.8 > 192.168.2.5: ICMP echo reply, id 1, seq 1, length 64 03:22:19.331071 26:5a:4c:5d:3d:7c > 02:52:59:00:00:11, ethertype 802.1Q (0x8100), length 102: vlan 2, p 0, ethertype IPv4 (0x0800), 8.8.8.8 > 192.168.2.5: ICMP echo reply, id 1, seq 2, length 64
9. The configuration is only really in effect once I reboot the node, sometimes I have to reboot the entire cluster (including control plane):
Actual results:
Configuration declared is not applied, need to reboot entire cluster for some things to work
Expected results:
All configuration declared is applied
Additional info:
- links to
-
RHEA-2024:139653 OpenShift Virtualization 4.18.0 Images