Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-42158

Localnet configuration seem to get stuck to previous versions, not what is currently declared

XMLWordPrintable

    • Incidents & Support
    • 5
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • No

      Description of problem:

      I've seen this weird behaviour a few times in my cluster, but I wasn't sure if it was me or not doing some wrong configuration.
      
      Yesterday I had it with a customer in a remote session, and it became very clear the localnet configurations are not being propagated correctly, old values get stuck and don't change on the datapath when the current NAD/NNCP declarations are changed.
      
      It only starts working as per NAD/NNCP specification after a reboot of the node, sometimes a reboot of the entire cluster, including control plan.
      
      The below example with VLAN is just a simple way to reproduce it, my impression is this is not a subset of configurations but a bigger problem. In my cluster, if I configure a localnet incorrectly for the first time but then correct the configurations later it never works, and need a reboot before the VMs can come up at all (fail with timeout on CNI Add) and/or be in the same network.

      Version-Release number of selected component (if applicable):

      4.15.2

      How reproducible:

      Always

      Steps to Reproduce:

      1. Setup a localnet as follows, nothing special

       

      $ oc get nncp mapping -o json | jq ' .spec' 
      {
        "desiredState": {
          "ovn": {
            "bridge-mappings": [
              {
                "bridge": "br-ex",
                "localnet": "localnet-bridge",
                "state": "present"
              }
            ]
          }
        }
      }
       
      

      The NAD is configured to tag the frames with VID 2

       

      $ oc get network-attachment-definition virt -o json | jq ' .spec' 
      {
        "config": "{\n        \"cniVersion\": \"0.3.1\", \n        \"name\": \"localnet-bridge\", \n        \"type\": \"ovn-k8s-cni-overlay\",\n        \"topology\": \"localnet\",\n        \"vlanID\": 2, \n        \"netAttachDefName\": \"homelab/virt\"\n}\n"
      }

       

      2. Bring the VM UP using this NAD and start a ping to 8.8.8.8 (or any other address)

      3. Tcpdump at the physical interface of the physical host

       

      $ tcpdump -i enp4s0 -nnn -e icmp
      03:15:19.646961 02:52:59:00:00:11 > 26:5a:4c:5d:3d:7c, ethertype 802.1Q (0x8100), length 102: vlan 2, p 0, ethertype IPv4 (0x0800), 192.168.2.5 > 8.8.8.8: ICMP echo request, id 1, seq 243, length 64
      03:15:19.667768 26:5a:4c:5d:3d:7c > 02:52:59:00:00:11, ethertype 802.1Q (0x8100), length 102: vlan 2, p 0, ethertype IPv4 (0x0800), 8.8.8.8 > 192.168.2.5: ICMP echo reply, id 1, seq 243, length 64
      

       

      NOTE: frames are tagged with VLAN ID 2, as per NAD

      4. Shutdown the VM

      5. Change the NAD to VID 222 or any other number

       

      $ oc get network-attachment-definition virt -o json | jq ' .spec' -S
      {
        "config": "{\n        \"cniVersion\": \"0.3.1\", \n        \"name\": \"localnet-bridge\", \n        \"type\": \"ovn-k8s-cni-overlay\",\n        \"topology\": \"localnet\",\n        \"vlanID\": 222, \n        \"netAttachDefName\": \"homelab/virt\"\n}\n"
      }

       

      6. Start the VM

      7. From the inside the VM, start the ping again

      8. Look at the frames going out:

      Its still on the original VLAN

       

      03:22:18.329029 26:5a:4c:5d:3d:7c > 02:52:59:00:00:11, ethertype 802.1Q (0x8100), length 102: vlan 2, p 0, ethertype IPv4 (0x0800), 8.8.8.8 > 192.168.2.5: ICMP echo reply, id 1, seq 1, length 64
      03:22:19.331071 26:5a:4c:5d:3d:7c > 02:52:59:00:00:11, ethertype 802.1Q (0x8100), length 102: vlan 2, p 0, ethertype IPv4 (0x0800), 8.8.8.8 > 192.168.2.5: ICMP echo reply, id 1, seq 2, length 64

       

      9. The configuration is only really in effect once I reboot the node, sometimes I have to reboot the entire cluster (including control plane):

      Actual results:

      Configuration declared is not applied, need to reboot entire cluster for some things to work

      Expected results:

      All configuration declared is applied

      Additional info:

       

              ysegev@redhat.com Yossi Segev
              rhn-support-gveitmic Germano Veit Michel
              Yossi Segev Yossi Segev
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: