Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-2014

Test Coverage: Handle ovn-encap-ip duplicates.

    • Icon: Task Task
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • None
    • OVN
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide

      ( ) The test coverage is aligned with the epic's acceptance criteria

      Given two chassis configured with the same ovn-encap-ip,

      When ovn-controller detects a duplicate Encap (same IP, different chassis), 

      Then it should log an error message and skip tunnel deletion/re-creation and avoid SB commit attempts until corrected. 

      Show
      ( ) The test coverage is aligned with the epic's acceptance criteria Given two chassis configured with the same ovn-encap-ip, When ovn-controller detects a duplicate Encap (same IP, different chassis),  Then it should log an error message and skip tunnel deletion/re-creation and avoid SB commit attempts until corrected. 
    • rhel-9
    • None

      This task is tracking the test case writing activities to cover the bug described below.

      Configuring multiple nodes with the same ovn-encap-ip cause OVS db (conf.db) to increase rapidly in size.

      Let's suppose multiple nodes properly configured, all using a different local_ip, e.g hv1 using ip1, hv2 using ip2 and hv2 using ip3.

      If then one node (e.g. hv1) is (re)configured with the same ovn-encap-ip as another node (e.g. hv2)  (i.e. as hv1 ovs-vsctl set open . external-ids:ovn-encap-ip=ip2) , the current ovn-controller will:

      • Delete existing geneve tunnels (using ip1).
      • Recreate new tunnels (using ip2).
      • Create Encap with ip2 and chassis=hv1 in sb.

      This commit will fail, as there is already an Encap with ip2 and chassis=hv2. When detecting the commit failure, OVN will :

      • Delete existing geneve tunnels (using ip2).
      • Recreate new tunnels using ip1.

      In the next run, ovn-controller will try again to use ip2, deleting existing tunnels (using ip1), recreating new ones (using ip2), and trying again to commit to sb. And so on.

      Ovn-controller could detect that an Encap with ip2, same type, and chassis != hv1 already exists in sb, and in this case prevent deleting/recreating tunnels, preventing trying (and failing) to write Encap in sb, and log an (rate limited to e.g. 1 per 10 seconds) error.

      It is a configuration issue, so the configuration issue should be fixed by CMS/User. However, while the issue is present, this causes high CPU on SB, ovn-controller and ovs. In addition and OVS db (conf.db) is increasing quickly in size (a local test with a few chassis show an increase of 1 MB per second).

      When the issue is resolved by CMS (corrected ovn-encap-ip on hv1 or hv2), then ovn-controller wakes up due to ovs or sb change and properly handles the change.

       Reproduction Steps: Provide detailed steps or scripts to replicate the issue.

      There might be simpler ways, but it can be easily reproduced using ovn-fake-multinode:

      Find local_ip for ovn-chassis-2: ip2=$(podman exec ovn-chassis-2 ovs-vsctl get open . external_ids:ovn-encap-ip)

      Use this ip in chassis-3: podman exec ovn-chassis-3 ovs-vsctl set open . external-ids:ovn-encap-ip=$ip2

       

       Expected Behavior: Describe what should happen under normal circumstances.

      ovn-controller should log an (rate limited) error, and avoid looping in deletion & creation of tunnels and commits failures to sb.

       Observed Behavior: Explain what actually happens.

      conf.db on ovn-chassis-3 (dramatically) increases in size, we see commit failures in sb, ovn-controller deletes and recreate tunnels.

      Logs: If you collected logs please provide them (e.g. sos report, /var/log/openvswitch/* , testpmd console)

      See https://issues.redhat.com/browse/OSPRH-17558

              ovnteam@redhat.com OVN Team
              nstbot NST Bot
              OVN QE OVN QE
              OVN
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: