Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-2782

QE verification: [ovn-northd] assertion failure in handle_od_lbgrp_changes() (lbgrp == NULL)

    • Icon: Task Task
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • ovn25.09
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide

      ( ) The bug has been reproduced and verified by QE members
      ( ) Test coverage has been added to downstream CI
      ( ) For new feature, failed test plans have bugs added as children to the epic
      ( ) The bug is cloned to any relevant release that we support and/or is needed

      Show
      ( ) The bug has been reproduced and verified by QE members ( ) Test coverage has been added to downstream CI ( ) For new feature, failed test plans have bugs added as children to the epic ( ) The bug is cloned to any relevant release that we support and/or is needed
    • rhel-9
    • None
    • OVN FDP Sprint 13
    • 1

      This ticket is tracking the QE verification effort for the solution to the problem described below.

       Problem Description: Clearly explain the issue.

      With ovn25.09-25.09.1-11.el9fdp ovn-northd hits an assertion failure:

      2025-12-01T12:22:35.784Z|00109|backtrace|ERR|lib/vlog.c:1309 backtrace:
      ovn-northd(+0xd3d37) [0x55bfabb7cd37]
      ovn-northd(+0xc07c3) [0x55bfabb697c3]
      ovn-northd(+0xb487b) [0x55bfabb5d87b]
      ovn-northd(+0x705af) [0x55bfabb195af]
      ovn-northd(+0x70827) [0x55bfabb19827]
      ovn-northd(+0x709cb) [0x55bfabb199cb]
      ovn-northd(+0x8792d) [0x55bfabb3092d]
      ovn-northd(+0x2012f) [0x55bfabac912f]
      /lib64/libc.so.6(+0x29590) [0x7f070f07f590]
      /lib64/libc.so.6(__libc_start_main+0x80) [0x7f070f07f640]
      ovn-northd(+0x21035) [0x55bfabaca035]
      

      Most likely here:

      static void
      handle_od_lbgrp_changes(struct nbrec_load_balancer_group **nbrec_lbgrps,
                              size_t n_nbrec_lbgrps, struct od_lb_data *od_lb_data,
                              struct ed_type_lb_data *lb_data,
                              struct crupdated_od_lb_data *codlb)
      {
          struct tracked_lb_data *trk_lb_data = &lb_data->tracked_lb_data;
          struct uuidset *pre_lbgrp_uuids = od_lb_data->lbgrps;
          od_lb_data->lbgrps = xzalloc(sizeof *od_lb_data->lbgrps);
          uuidset_init(od_lb_data->lbgrps);
          for (size_t i = 0; i < n_nbrec_lbgrps; i++) {
              const struct uuid *lbgrp_uuid = &nbrec_lbgrps[i]->header_.uuid;
              uuidset_insert(od_lb_data->lbgrps, lbgrp_uuid);
      
              if (!uuidset_find_and_delete(pre_lbgrp_uuids, lbgrp_uuid)) {
                  /* Add this lb group to the tracked data. */
                  uuidset_insert(&codlb->assoc_lbgrps, lbgrp_uuid);
      
                  if (!trk_lb_data->has_routable_lb) {
                      struct ovn_lb_group *lbgrp =
                          ovn_lb_group_find(&lb_data->lbgrps, lbgrp_uuid);
                      ovs_assert(lbgrp);                                                                      <<<<<<<<<<<<<<<<<
                      trk_lb_data->has_routable_lb |= lbgrp->has_routable_lb;
                  }
              }
          }
      
          if (!uuidset_is_empty(pre_lbgrp_uuids)) {
              trk_lb_data->has_dissassoc_lbgrps_from_od = true;
          }
      
          uuidset_destroy(pre_lbgrp_uuids);
          free(pre_lbgrp_uuids);
      }
      

      ovn-northd logs:
      https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_ovn-kubernetes/2881/pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ipi-ovn-dualstack-bgp-local-gw/1995423294387392512/artifacts/e2e-metal-ipi-ovn-dualstack-bgp-local-gw/gather-extra/artifacts/pods/openshift-ovn-kubernetes_ovnkube-node-ssl7f_northd_previous.log

       Impact Assessment: Describe the severity and impact (e.g., network down,availability of a workaround, etc.).

      northd crash
       

       Software Versions: Specify the exact versions in use (e.g.,openvswitch3.1-3.1.0-147.el8fdp).

      ovn25.09-25.09.1-11.el9fdp
       

        Issue Type: Indicate whether this is a new issue or a regression (if a regression, state the last known working version).

      Unknown
       

       Reproducibility: Confirm if the issue can be reproduced consistently. If not, describe how often it occurs.

      Sometimes in OCP CI, e.g.: https://github.com/openshift/ovn-kubernetes/pull/2881

       

       Reproduction Steps: Provide detailed steps or scripts to replicate the issue.

      Unknown, working on a reproducer.
       

       Expected Behavior: Describe what should happen under normal circumstances.

      Northd should not crash.
       

       Observed Behavior: Explain what actually happens.

       

       Troubleshooting Actions: Outline the steps taken to diagnose or resolve the issue so far.

       

       Logs: If you collected logs please provide them (e.g. sos report, /var/log/openvswitch/* , testpmd console)

      Must-gather (NB/SB DBs were unfortunately compacted so they're not too useful):
      https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/pr-logs/pull/openshift_ovn-kubernetes/2881/pull-ci-openshift-ovn-kubernetes-master-e2e-metal-ipi-ovn-dualstack-bgp-local-gw/1995423294387392512/artifacts/e2e-metal-ipi-ovn-dualstack-bgp-local-gw/gather-must-gather/artifacts/must-gather.tar

              ovnteam@redhat.com OVN Team
              nstbot NST Bot
              OVN QE OVN QE
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: