Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-2779

[ovn-northd] assertion failure in handle_od_lbgrp_changes() (lbgrp == NULL)

    • Icon: Epic Epic
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • OVN
    • None
    • [ovn-northd] assertion failure in handle_od_lbgrp_changes() (lbgrp == NULL)
    • 13
    • False
    • False
    • Hide

      Please mark each item below with ( / ) if completed or ( x ) if incomplete:

      ( ) The acceptance criteria defined below are met.

      Given an OVN deployment running ovn-northd version 25.09.x with load balancer groups configured on logical switches or routers,

      When the lb_data engine processes load balancer group changes during incremental processing,

      Then northd does not crash with an assertion failure in handle_od_lbgrp_changes(), and the process remains running without assertion errors in the logs.


      ( ) The epics work is available in a downstream build (nightly/Async or other)


      ( ) All cards under the epic have been moved to Done

      Show
      Please mark each item below with ( / ) if completed or ( x ) if incomplete: ( ) The acceptance criteria defined below are met. Given an OVN deployment running ovn-northd version 25.09.x with load balancer groups configured on logical switches or routers, When the lb_data engine processes load balancer group changes during incremental processing, Then northd does not crash with an assertion failure in handle_od_lbgrp_changes(), and the process remains running without assertion errors in the logs. ( ) The epics work is available in a downstream build (nightly/Async or other) ( ) All cards under the epic have been moved to Done
    • rhel-9
    • rhel-net-ovn
    • 100% To Do, 0% In Progress, 0% Done
    • ssg_networking

      This epic tracks all the effort needed to deliver the solution related to the bug described below.

       Problem Description: Clearly explain the issue.

      With ovn25.09-25.09.1-11.el9fdp ovn-northd hits an assertion failure:

      2025-12-01T12:22:35.784Z|00109|backtrace|ERR|lib/vlog.c:1309 backtrace:
      ovn-northd(+0xd3d37) [0x55bfabb7cd37]
      ovn-northd(+0xc07c3) [0x55bfabb697c3]
      ovn-northd(+0xb487b) [0x55bfabb5d87b]
      ovn-northd(+0x705af) [0x55bfabb195af]
      ovn-northd(+0x70827) [0x55bfabb19827]
      ovn-northd(+0x709cb) [0x55bfabb199cb]
      ovn-northd(+0x8792d) [0x55bfabb3092d]
      ovn-northd(+0x2012f) [0x55bfabac912f]
      /lib64/libc.so.6(+0x29590) [0x7f070f07f590]
      /lib64/libc.so.6(__libc_start_main+0x80) [0x7f070f07f640]
      ovn-northd(+0x21035) [0x55bfabaca035]
      

      Most likely here:

      static void
      handle_od_lbgrp_changes(struct nbrec_load_balancer_group **nbrec_lbgrps,
                              size_t n_nbrec_lbgrps, struct od_lb_data *od_lb_data,
                              struct ed_type_lb_data *lb_data,
                              struct crupdated_od_lb_data *codlb)
      {
          struct tracked_lb_data *trk_lb_data = &lb_data->tracked_lb_data;
          struct uuidset *pre_lbgrp_uuids = od_lb_data->lbgrps;
          od_lb_data->lbgrps = xzalloc(sizeof *od_lb_data->lbgrps);
          uuidset_init(od_lb_data->lbgrps);
          for (size_t i = 0; i < n_nbrec_lbgrps; i++) {
              const struct uuid *lbgrp_uuid = &nbrec_lbgrps[i]->header_.uuid;
              uuidset_insert(od_lb_data->lbgrps, lbgrp_uuid);
      
              if (!uuidset_find_and_delete(pre_lbgrp_uuids, lbgrp_uuid)) {
                  /* Add this lb group to the tracked data. */
                  uuidset_insert(&codlb->assoc_lbgrps, lbgrp_uuid);
      
                  if (!trk_lb_data->has_routable_lb) {
                      struct ovn_lb_group *lbgrp =
                          ovn_lb_group_find(&lb_data->lbgrps, lbgrp_uuid);
                      ovs_assert(lbgrp);                                                                      <<<<<<<<<<<<<<<<<
                      trk_lb_data->has_routable_lb |= lbgrp->has_routable_lb;
                  }
              }
          }
      
          if (!uuidset_is_empty(pre_lbgrp_uuids)) {
              trk_lb_data->has_dissassoc_lbgrps_from_od = true;
          }
      
          uuidset_destroy(pre_lbgrp_uuids);
          free(pre_lbgrp_uuids);
      }
      

      ovn-northd logs:
      https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_ovn-kubernetes/2881/pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ipi-ovn-dualstack-bgp-local-gw/1995423294387392512/artifacts/e2e-metal-ipi-ovn-dualstack-bgp-local-gw/gather-extra/artifacts/pods/openshift-ovn-kubernetes_ovnkube-node-ssl7f_northd_previous.log

       Impact Assessment: Describe the severity and impact (e.g., network down,availability of a workaround, etc.).

      northd crash
       

       Software Versions: Specify the exact versions in use (e.g.,openvswitch3.1-3.1.0-147.el8fdp).

      ovn25.09-25.09.1-11.el9fdp
       

        Issue Type: Indicate whether this is a new issue or a regression (if a regression, state the last known working version).

      Unknown
       

       Reproducibility: Confirm if the issue can be reproduced consistently. If not, describe how often it occurs.

      Sometimes in OCP CI, e.g.: https://github.com/openshift/ovn-kubernetes/pull/2881

       

       Reproduction Steps: Provide detailed steps or scripts to replicate the issue.

      Unknown, working on a reproducer.
       

       Expected Behavior: Describe what should happen under normal circumstances.

      Northd should not crash.
       

       Observed Behavior: Explain what actually happens.

       

       Troubleshooting Actions: Outline the steps taken to diagnose or resolve the issue so far.

       

       Logs: If you collected logs please provide them (e.g. sos report, /var/log/openvswitch/* , testpmd console)

      Must-gather (NB/SB DBs were unfortunately compacted so they're not too useful):
      https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_ovn-kubernetes/2881/pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ipi-ovn-dualstack-bgp-local-gw/1995423294387392512/artifacts/e2e-metal-ipi-ovn-dualstack-bgp-local-gw/gather-must-gather/artifacts/must-gather.tar

              ovnteam@redhat.com OVN Team
              dceara@redhat.com Dumitru Ceara
              OVN QE OVN QE
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: