Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-2783

[ovn-northd] assertion failure due to trying to write to a deleted IDL record

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • OVN
    • None
    • [ovn-northd] assertion failure due to trying to write to a deleted IDL record
    • 13
    • False
    • False
    • Hide

      Please mark each item below with ( / ) if completed or ( x ) if incomplete:

      ( ) The acceptance criteria defined below are met.

      Given an OVN deployment running ovn-northd version 25.09,

      When the en_sync_from_sb engine attempts to update a Port_Binding's up column,

      Then northd does not crash with an assertion failure in ovsdb_idl_txn_write__() and the process remains running without assertion errors in the logs


      ( ) The epics work is available in a downstream build (nightly/Async or other)


      ( ) All cards under the epic have been moved to Done

      Show
      Please mark each item below with ( / ) if completed or ( x ) if incomplete: ( ) The acceptance criteria defined below are met. Given an OVN deployment running ovn-northd version 25.09, When the en_sync_from_sb engine attempts to update a Port_Binding's up column, Then northd does not crash with an assertion failure in ovsdb_idl_txn_write__() and the process remains running without assertion errors in the logs ( ) The epics work is available in a downstream build (nightly/Async or other) ( ) All cards under the epic have been moved to Done
    • rhel-9
    • rhel-net-ovn
    • 100% To Do, 0% In Progress, 0% Done
    • ssg_networking

      This epic tracks all the effort needed to deliver the solution related to the bug described below.

       Problem Description: Clearly explain the issue.

      With ovn25.09-25.09.1-11.el9fdp ovn-northd hits an assertion failure:

      2025-12-01T18:33:16.043Z|00021|backtrace|ERR|lib/vlog.c:1309 backtrace:
      ovn-northd(+0xd3d37) [0x5616a6f33d37]
      ovn-northd(+0xc07c3) [0x5616a6f207c3]
      ovn-northd(+0xb487b) [0x5616a6f1487b]
      ovn-northd(+0xd50fa) [0x5616a6f350fa]
      ovn-northd(+0x77442) [0x5616a6ed7442]
      ovn-northd(+0x85397) [0x5616a6ee5397]
      ovn-northd(+0x87a8a) [0x5616a6ee7a8a]
      ovn-northd(+0x2012f) [0x5616a6e8012f]
      /lib64/libc.so.6(+0x29590) [0x7f6b92352590]
      /lib64/libc.so.6(__libc_start_main+0x80) [0x7f6b92352640]
      ovn-northd(+0x21035) [0x5616a6e81035]
      ovn-northd: lib/ovsdb-idl.c:3650: assertion row->new_datum != NULL failed in ovsdb_idl_txn_write__()
      

       
      Decoded backtrace:

      ovn-northd(+0xb487b) [0x5616a6f1487b]    ovs_assert_failure (ovs-2e7d3ea0d0cade43da9156646cf8260f454431e8/lib/util.c:88).
      ovn-northd(+0xd50fa) [0x5616a6f350fa]    ovsdb_idl_txn_write__()
      ovn-northd(+0x77442) [0x5616a6ed7442]    en_sync_from_sb_run()
      ovn-northd(+0x85397) [0x5616a6ee5397]
      ovn-northd(+0x87a8a) [0x5616a6ee7a8a]
      
      
      
      (gdb) list *0x77442
      0x77442 is in en_sync_from_sb_run (northd/northd.c:19564).
      19559           }
      19560
      19561           /* ovn-controller will update 'Port_Binding.up' only if it was
      19562            * explicitly set to 'false'.
      19563            */
      19564           if (!op->sb->n_up) {
      19565               up = false;
      19566               sbrec_port_binding_set_up(op->sb, &up, 1);
      19567           }
      

      Hit here: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshi[...]bernetes_ovnkube-node-sxwmc_northd_previous.log

       Impact Assessment: Describe the severity and impact (e.g., network down,availability of a workaround, etc.).

      northd crashes
       

       Software Versions: Specify the exact versions in use (e.g.,openvswitch3.1-3.1.0-147.el8fdp).

      ovn25.09-25.09.1-11.el9fdp
       

        Issue Type: Indicate whether this is a new issue or a regression (if a regression, state the last known working version).

      unknown
       

       Reproducibility: Confirm if the issue can be reproduced consistently. If not, describe how often it occurs.

      Sometimes in OCP CI, e.g.:
      https://github.com/openshift/ovn-kubernetes/pull/2884
       

       Reproduction Steps: Provide detailed steps or scripts to replicate the issue.

      Unknown.
       

       Expected Behavior: Describe what should happen under normal circumstances.

      ovn-northd should not crash
       

       Observed Behavior: Explain what actually happens.

      ovn-northd tries to write to a deleted IDL record triggering an assertion failure in the IDL.
       

       Troubleshooting Actions: Outline the steps taken to diagnose or resolve the issue so far.

       

       Logs: If you collected logs please provide them (e.g. sos report, /var/log/openvswitch/* , testpmd console)

      See attached NB/SB databases. NOTE: they're taken way after the fact so we need to try to correlate what happened around the time of the crash (2025-12-01T18:33:16.043Z).

        1. ovnkube-node-sxwmc_nbdb
          7.13 MB
          Dumitru Ceara
        2. ovnkube-node-sxwmc_sbdb
          15.93 MB
          Dumitru Ceara

              ovnteam@redhat.com OVN Team
              dceara@redhat.com Dumitru Ceara
              OVN QE OVN QE
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: