-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
2
-
False
-
-
False
-
ovn26.03-26.03.0-alpha.211.el9fdp
-
rhel-9
-
None
-
rhel-net-ovn
-
-
-
ssg_networking
Problem Description: Clearly explain the issue.
Using a reproducer script for issue FDP-2780, I managed to cause a different crash to occur in ovn-northd in the en_datapath_synced_logical_switch_datapath_sync_handler(). It likely affects logical routers as well.
Impact Assessment: Describe the severity and impact (e.g., network down,availability of a workaround, etc.).
This crashes ovn-northd.
Software Versions: Specify the exact versions in use (e.g.,openvswitch3.1-3.1.0-147.el8fdp).
This happened on the HEAD of branch-25.09 as of 2 December, 2025. Specifically, commit 1fa36ec73f05252db1a0877a960918e004fda07a. This likely does not affect earlier OVN streams.
Issue Type: Indicate whether this is a new issue or a regression (if a regression, state the last known working version).
New issue.
Reproducibility: Confirm if the issue can be reproduced consistently. If not, describe how often it occurs.
I have a reproducer script that I wrote for the assertion reported in FDP-2780. The script sometimes triggers that assertion, but other times it crashes ovn-northd with the segfault described in this issue. I have no idea what the likelihood is of running into this in the wild.
Reproduction Steps: Provide detailed steps or scripts to replicate the issue.
The following reproducer script causes the crash:
#!/bin/bash
set -e
while [ 1 ] ; do
ovn-nbctl ls-add ls1
ovn-nbctl lb-add lb1 192.168.0.1 10.0.0.1
lb_uuid=$(ovn-nbctl --bare --columns=_uuid find load_balancer name=lb1)
# Pause ovn-northd so that it does not receive IDL updates while we
# perform the next operations.
echo "northd going to sleep"
kill -STOP $(cat sandbox/ovn-northd.pid)
uuid=$(uuidgen)
ovn-nbctl --id=$uuid create load_balancer_group name=lbg1 load_balancer=$lb_uuid
ovn-nbctl destroy load_balancer_group $uuid
ovn-nbctl --id=$uuid create load_balancer_group name=lbg1 load_balancer=$lb_uuid
ovn-nbctl set logical_switch ls1 load_balancer_group=$uuid
# Now wake ovn-northd up and see if it asserts.
echo "northd waking up"
kill -CONT $(cat sandbox/ovn-northd.pid)
ovn-nbctl ls-del ls1
ovn-nbctl lb-del lb1
ovn-nbctl --all destroy load_balancer_group
done
Expected Behavior: Describe what should happen under normal circumstances.
ovn-northd should not crash.
Observed Behavior: Explain what actually happens.
ovn-northd crashes when trying to process the synced datapath for the added logical switch. Specifically, the sdp->nb_row appears to be pointing to junk data. When we try to dereference sdp->nb_row->table->class_, it causes a segfault.
Troubleshooting Actions: Outline the steps taken to diagnose or resolve the issue so far.
I reported this as soon as I realized the reproducer causes the crash. It will be up to the assignee on this issue to try to determine why the crash is occurring. I originally suspected that the reproducer's quick deletion and re-adding of the logical switch might cause some confusing IDL messages. However, since the logical switch has a different UUID each time it is re-added, I think this is unlikely.
Logs: If you collected logs please provide them (e.g. sos report, /var/log/openvswitch/* , testpmd console)
The reproducer can be used to get a core file. However, here is a quick backtrace I got when I saw the issue the first time:
(gdb) bt
#0 0x000000000043294b in en_datapath_synced_logical_switch_run (node=<optimized out>, data=0x2730a0f0) at northd/en-datapath-logical-switch.c:297
#1 0x000000000045b148 in engine_recompute (node=node@entry=0x733ee0 <en_datapath_synced_logical_switch>, allowed=allowed@entry=true, reason_fmt=reason_fmt@entry=0x61d2c4 "failed handler for input %s") at lib/inc-proc-eng.c:443
#2 0x000000000045bc8c in engine_compute (node=<optimized out>, recompute_allowed=<optimized out>) at lib/inc-proc-eng.c:486
#3 engine_run_node (node=0x733ee0 <en_datapath_synced_logical_switch>, recompute_allowed=<optimized out>) at lib/inc-proc-eng.c:545
#4 engine_run (recompute_allowed=recompute_allowed@entry=true) at lib/inc-proc-eng.c:571
#5 0x000000000044d98b in inc_proc_northd_run (ovnnb_txn=ovnnb_txn@entry=0x2743c420, ovnsb_txn=ovnsb_txn@entry=0x2740d790, ctx=ctx@entry=0x7fffc314ad40) at northd/inc-proc-northd.c:580
#6 0x00000000004048b6 in main (argc=<optimized out>, argv=<optimized out>) at northd/ovn-northd.c:1096
(gdb) list
292 synced_logical_switch_map_destroy(switch_map);
293 synced_logical_switch_map_init(switch_map);
294
295 struct ovn_synced_datapath *sdp;
296 HMAP_FOR_EACH (sdp, hmap_node, &dps->synced_dps) {
297 if (sdp->nb_row->table->class_ != &nbrec_table_logical_switch) {
298 continue;
299 }
300 struct ovn_synced_logical_switch *lsw =
301 synced_logical_switch_alloc(sdp);
(gdb) p sdp
$1 = (struct ovn_synced_datapath *) 0x27430450
(gdb) p sdp->nb_row
$2 = (const struct ovsdb_idl_row *) 0x273fc9f0
(gdb) p sdp->nb_row->table
$3 = (struct ovsdb_idl_table *) 0x2d333831612d3033
(gdb) p sdp->nb_row->table->class_
Cannot access memory at address 0x2d333831612d3033
(gdb) p sdp->nb_row->table
$4 = (struct ovsdb_idl_table *) 0x2d333831612d3033
(gdb) p *sdp->nb_row->table
Cannot access memory at address 0x2d333831612d3033
(gdb) p *sdp->nb_row
$5 = {hmap_node = {hash = 658485756, next = 0x27422180}, uuid = {parts = {658763392, 0, 658827088, 0}}, src_arcs = {prev = 0x273fca10, next = 0x31}, dst_arcs = {prev = 0x273db95c, next = 0x30342d373331322d}, table = 0x2d333831612d3033, old_datum = 0x3330633137333563,
persist_uuid = 97, parsed = 97, reparse_node = {prev = 0x31, next = 0x273db98c}, new_datum = 0x0, prereqs = 0x273fca80, written = 0x0, txn_node = {hash = 0, next = 0x31}, map_op_written = 0x273db8cc, map_op_lists = 0x0, set_op_written = 0x0, set_op_lists = 0x0,
change_seqno = {0, 0, 49}, track_node = {prev = 0x273db9bc, next = 0x0}, updated = 0x273fca20, tracked_old_datum = 0x273fca50}
- clones
-
FDP-2805 Segfault in synced logical datapath handler
-
- Closed
-