-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
5
-
False
-
-
False
-
-
ovn25.03-25.03.1-63.el9fdp
-
rhel-9
-
None
-
rhel-net-ovn
-
-
-
ssg_networking
-
OVN FDP Sprint 9
-
1
-
-
Problem Description: Clearly explain the issue.
As indicated by imaximet@redhat.com it's not necessarily safe to remove routes from the Linux routing table at the same time we're dumping the table's contents through netlink:
Discussed here:
https://mail.openvswitch.org/pipermail/ovs-dev/2025-August/425375.html
>> The handle_msg_cb() may attempt to delete the entry from the kernel. >> Neltink dumps are not very relaiable, i.e. entries can be dumped twice >> or missed, in case the kernel tables are modified during the dump, but >> this call actually makes it far more likely. I don't think it's a good >> idea in general to modify the table while dumping it, unless you're OK >> with missed/duplicated entries. >> Ah, I wasn't aware of that. I'll delay deletes until the dump is done. We should fix this for route-exchange as well though.
The comment above was initially for Linux neighbor interaction but it applies to the already existing OVN code for route manipulation:
https://github.com/ovn-org/ovn/blob/ba964c845e5d666083a03c1ab2a7013cd85ba863/controller/route-exchange-netlink.c#L207-L273
static void handle_route_msg(const struct route_table_msg *msg, void *data) { struct route_msg_handle_data *handle_data = data; const struct route_data *rd = &msg->rd; struct advertise_route_entry *ar; int err; /* This route is not from us, so we learn it. */ if (rd->rtm_protocol != RTPROT_OVN) { if (!handle_data->learned_routes) { return; } if (prefix_is_link_local(&rd->rta_dst, rd->rtm_dst_len)) { return; } struct route_data_nexthop *nexthop; LIST_FOR_EACH (nexthop, nexthop_node, &rd->nexthops) { if (ipv6_is_zero(&nexthop->addr)) { /* This is most likely an address on the local link. * As we just want to learn remote routes we do not need it.*/ continue; } struct re_nl_received_route_node rr; rr = (struct re_nl_received_route_node) { .db = handle_data->db, .prefix = rd->rta_dst, .plen = rd->rtm_dst_len, .nexthop = nexthop->addr, }; memcpy(rr.ifname, nexthop->ifname, IFNAMSIZ); rr.ifname[IFNAMSIZ] = '\0'; vector_push(handle_data->learned_routes, &rr); } return; } if (handle_data->routes_to_advertise) { uint32_t hash = advertise_route_hash(&rd->rta_dst, rd->rtm_dst_len); HMAP_FOR_EACH_WITH_HASH (ar, node, hash, handle_data->routes) { if (ipv6_addr_equals(&ar->addr, &rd->rta_dst) && ar->plen == rd->rtm_dst_len && ar->priority == rd->rta_priority) { hmapx_find_and_delete(handle_data->routes_to_advertise, ar); return; } } } err = re_nl_delete_route(rd->rta_table_id, &rd->rta_dst, rd->rtm_dst_len, rd->rta_priority); if (err) { char addr_s[INET6_ADDRSTRLEN + 1]; static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20); VLOG_WARN_RL(&rl, "Delete route table_id=%"PRIu32" dst=%s plen=%d " "failed: %s", rd->rta_table_id, ipv6_string_mapped( addr_s, &rd->rta_dst) ? addr_s : "(invalid)", rd->rtm_dst_len, ovs_strerror(err)); if (!handle_data->ret) { /* Report the first error value to the caller. */ handle_data->ret = err; } } }
The handle_route_msg() callback is invoked by route_table_dump_one_table() while the netlink dump is still ongoing. And handle_route_msg() potentially removes routes.
A way to address the issue is to delay deletion of stale OVN routes until after the dump is completed. Similar to what happens for Linux neighbor entries:
https://github.com/ovn-org/ovn/blob/ba964c845e5d666083a03c1ab2a7013cd85ba863/controller/neighbor-exchange-netlink.c#L141-L162
Impact Assessment: Describe the severity and impact (e.g., network down,availability of a workaround, etc.).
Incorrect routing.
Software Versions: Specify the exact versions in use (e.g.,openvswitch3.1-3.1.0-147.el8fdp).
Any ovn version >= 25.03.
Issue Type: Indicate whether this is a new issue or a regression (if a regression, state the last known working version).
Day one issue
Reproducibility: Confirm if the issue can be reproduced consistently. If not, describe how often it occurs.
Reproduction Steps: Provide detailed steps or scripts to replicate the issue.
Spotted during code review, probably not that easy to reproduce consistently.
Expected Behavior: Describe what should happen under normal circumstances.
Deletion of linux routes should happen after the current route netlink dump finishes.
Observed Behavior: Explain what actually happens.
See description.
Troubleshooting Actions: Outline the steps taken to diagnose or resolve the issue so far.
Logs: If you collected logs please provide them (e.g. sos report, /var/log/openvswitch/* , testpmd console)
- is cloned by
-
FDP-1676 CLONE [ovn26.03 fast-datapath-rhel-9] - [OVN][BGP] ovn-controller modifies the host routing table while dumping its contents
-
- Dev Complete
-
-
FDP-1678 CLONE [ovn25.03 fast-datapath-rhel-10] - [OVN][BGP] ovn-controller modifies the host routing table while dumping its contents
-
- Dev Complete
-
-
FDP-1677 CLONE [ovn25.09 fast-datapath-rhel-9] - [OVN][BGP] ovn-controller modifies the host routing table while dumping its contents
-
- Verified
-