Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-1594

[OVN][BGP] ovn-controller modifies the host routing table while dumping its contents

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • ovn25.03
    • None
    • 5
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide

      Given an OVN node with BGP route-exchange enabled and background route churn during controller startup/resync,

      When ovn-controller performs a netlink route dump,

      Then it does not issue any route deletions until the dump completes, and after reconciliation the host route table matches the intended OVN state without spurious deletions. 

      Show
      Given an OVN node with BGP route-exchange enabled and background route churn during controller startup/resync, When ovn-controller performs a netlink route dump, Then it does not issue any route deletions until the dump completes, and after reconciliation the host route table matches the intended OVN state without spurious deletions. 
    • ovn25.03-25.03.1-63.el9fdp
    • rhel-9
    • None
    • rhel-net-ovn
    • ssg_networking
    • OVN FDP Sprint 9
    • 1
    • -

       Problem Description: Clearly explain the issue.

      As indicated by imaximet@redhat.com it's not necessarily safe to remove routes from the Linux routing table at the same time we're dumping the table's contents through netlink:

      Discussed here:

      https://mail.openvswitch.org/pipermail/ovs-dev/2025-August/425375.html

      >> The handle_msg_cb() may attempt to delete the entry from the kernel.
      >> Neltink dumps are not very relaiable, i.e. entries can be dumped twice
      >> or missed, in case the kernel tables are modified during the dump, but
      >> this call actually makes it far more likely.  I don't think it's a good
      >> idea in general to modify the table while dumping it, unless you're OK
      >> with missed/duplicated entries.
      >>
      
      Ah, I wasn't aware of that.  I'll delay deletes until the dump is done.
      We should fix this for route-exchange as well though.
      

      The comment above was initially for Linux neighbor interaction but it applies to the already existing OVN code for route manipulation:
      https://github.com/ovn-org/ovn/blob/ba964c845e5d666083a03c1ab2a7013cd85ba863/controller/route-exchange-netlink.c#L207-L273

      static void
      handle_route_msg(const struct route_table_msg *msg, void *data)
      {
          struct route_msg_handle_data *handle_data = data;
          const struct route_data *rd = &msg->rd;
          struct advertise_route_entry *ar;
          int err;
      
          /* This route is not from us, so we learn it. */
          if (rd->rtm_protocol != RTPROT_OVN) {
              if (!handle_data->learned_routes) {
                  return;
              }
              if (prefix_is_link_local(&rd->rta_dst, rd->rtm_dst_len)) {
                  return;
              }
              struct route_data_nexthop *nexthop;
              LIST_FOR_EACH (nexthop, nexthop_node, &rd->nexthops) {
                  if (ipv6_is_zero(&nexthop->addr)) {
                      /* This is most likely an address on the local link.
                       * As we just want to learn remote routes we do not need it.*/
                      continue;
                  }
                  struct re_nl_received_route_node rr;
                  rr = (struct re_nl_received_route_node) {
                      .db = handle_data->db,
                      .prefix = rd->rta_dst,
                      .plen = rd->rtm_dst_len,
                      .nexthop = nexthop->addr,
                  };
                  memcpy(rr.ifname, nexthop->ifname, IFNAMSIZ);
                  rr.ifname[IFNAMSIZ] = '\0';
      
                  vector_push(handle_data->learned_routes, &rr);
              }
              return;
          }
      
          if (handle_data->routes_to_advertise) {
              uint32_t hash = advertise_route_hash(&rd->rta_dst, rd->rtm_dst_len);
              HMAP_FOR_EACH_WITH_HASH (ar, node, hash, handle_data->routes) {
                  if (ipv6_addr_equals(&ar->addr, &rd->rta_dst)
                          && ar->plen == rd->rtm_dst_len
                          && ar->priority == rd->rta_priority) {
                      hmapx_find_and_delete(handle_data->routes_to_advertise, ar);
                      return;
                  }
              }
          }
          err = re_nl_delete_route(rd->rta_table_id, &rd->rta_dst,
                                   rd->rtm_dst_len, rd->rta_priority);
          if (err) {
              char addr_s[INET6_ADDRSTRLEN + 1];
              static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
              VLOG_WARN_RL(&rl, "Delete route table_id=%"PRIu32" dst=%s plen=%d "
                           "failed: %s", rd->rta_table_id,
                           ipv6_string_mapped(
                               addr_s, &rd->rta_dst) ? addr_s : "(invalid)",
                           rd->rtm_dst_len,
                           ovs_strerror(err));
      
              if (!handle_data->ret) {
                  /* Report the first error value to the caller. */
                  handle_data->ret = err;
              }
          }
      }
      

      The handle_route_msg() callback is invoked by route_table_dump_one_table() while the netlink dump is still ongoing. And handle_route_msg() potentially removes routes.

      A way to address the issue is to delay deletion of stale OVN routes until after the dump is completed. Similar to what happens for Linux neighbor entries:
      https://github.com/ovn-org/ovn/blob/ba964c845e5d666083a03c1ab2a7013cd85ba863/controller/neighbor-exchange-netlink.c#L141-L162

       Impact Assessment: Describe the severity and impact (e.g., network down,availability of a workaround, etc.).

      Incorrect routing.

       Software Versions: Specify the exact versions in use (e.g.,openvswitch3.1-3.1.0-147.el8fdp).

      Any ovn version >= 25.03.

        Issue Type: Indicate whether this is a new issue or a regression (if a regression, state the last known working version).

      Day one issue

       Reproducibility: Confirm if the issue can be reproduced consistently. If not, describe how often it occurs.

       

       Reproduction Steps: Provide detailed steps or scripts to replicate the issue.

      Spotted during code review, probably not that easy to reproduce consistently.

       Expected Behavior: Describe what should happen under normal circumstances.

      Deletion of linux routes should happen after the current route netlink dump finishes.

       Observed Behavior: Explain what actually happens.

      See description.

       Troubleshooting Actions: Outline the steps taken to diagnose or resolve the issue so far.

       

       Logs: If you collected logs please provide them (e.g. sos report, /var/log/openvswitch/* , testpmd console)

              amusil@redhat.com Ales Musil
              dceara@redhat.com Dumitru Ceara
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: