Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-1140

CLONE [ovn24.09 fast-datapath-rhel-9] - ovn-controller skips MAC binding timestamp update causing record removal for the active entry

    • 1
    • False
    • Hide

      None

      Show
      None
    • False
    • ovn24.09-24.09.2-14.el9fdp
    • rhel-9
    • None
    • rhel-net-ovn
    • ssg_networking
    • Customer Escalated
    • +

       Problem Description:

      ovn-controller doesn't update timestamps until 3/4 of the expiration timeout have passed, but then it may wait for another 3/4. And 3/4 + 3/4 > 1. So, northd removes the entry before ovn-controller wakes up to refresh it next time.

      For example:
      Let's say the threshold is always the same 20 sec and all entries we create have constant traffic. Let's see the timeline that starts at time A:

      1. [A] We create one mac entry, next request: A+15
      2. [A+5] We create second entry, next request doesn't change and is still A+15.
      3. [A+15] We send an OF stats request and get a reply.
        • First mac entry is updated, because dump interval is 15 and original timestamp is A, which is now + some delta, i.e. A+15 <= A+15.
        • Second entry is not updated, because (A+5)+15 > A+15
        • The next request is scheduled for now + 15 (thresholds didn't change), so the next request is at A+30.
      4. [A+25] northd removes the second entry since it wasn't updated and the threshold passed.
      5. [A+30] We send an OF stats request and get a reply, but the entry is already removed.

       Impact Assessment:

      This causing removal of MAC binding entries with an active traffic on them. Result is that traffic starts flowing to userspace and to ovn-controller for ARP resolution and packets being dropped on CoPP meter.

       Software Versions:

      23.09.4-16.el9fdp

        Issue Type:

      New issue

       Reproducibility:

      100%

       Reproduction Steps:

      Originally reproduced with running a ping towards host's default gateway from an OpenShift pod. It can be observed that the MAC binding gets created and removed later while the traffic is running. See FDP-1120.

       Expected Behavior:

      MAC binding entries for active traffic should not expire.

       Observed Behavior: Explain what actually happens.

      MAC binding entry for active traffic expire from time to time.

              imaximet@redhat.com Ilya Maximets
              ovnteam@redhat.com OVN Team
              Ehsan Elahi Ehsan Elahi
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: