Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-3100

Upstream: ovn-controller crash when set dynamic-routing-redistribute as fdb,ip

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • ovn26.03
    • None
    • 3
    • Hide

      Please mark each item below with ( / ) if completed or ( x ) if incomplete:
      ( ) Unit test or Integration test case are written and pass successfully


      ( ) The upstream pull request is merged upstream and pass CI

      Show
      Please mark each item below with ( / ) if completed or ( x ) if incomplete: ( ) Unit test or Integration test case are written and pass successfully ( ) The upstream pull request is merged upstream and pass CI
    • rhel-9
    • None
    • rhel-net-ovn
    • OVN FDP Sprint 15
    • 1

      This is tracking the upstream effort needed to deliver the solution to the bug described below.


       Problem Description: Clearly explain the issue.

      ovn-controller crash when set dynamic-routing-redistribute as fdb,ip

       Impact Assessment: Describe the severity and impact (e.g., network down,availability of a workaround, etc.).

       

       Software Versions: Specify the exact versions in use (e.g.,openvswitch3.1-3.1.0-147.el8fdp).

      ovn26.03-26.03.0-alpha.317.el9fdp.x86_64

        Issue Type: Indicate whether this is a new issue or a regression (if a regression, state the last known working version).

       

       Reproducibility: Confirm if the issue can be reproduced consistently. If not, describe how often it occurs.

      Always

       Reproduction Steps: Provide detailed steps or scripts to replicate the issue.

      enable_coredump()                                                                                    
      {                                                                                                    
              ulimit -c unlimited                                                                          
              ulimit -s unlimited                                                                          
              sysctl -w fs.suid_dumpable=2                                                                 
              if ! sysctl kernel.core_pattern | grep systemd-coredump                                      
              then                                                                                         
                      sysctl -w kernel.core_pattern="|/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h %e"
              fi                                                                                           
              rm -rf /var/lib/systemd/coredump/*                                                           
              rm -rf /run/log/journal/*                                                                    
              rm -rf /var/log/journal/*                                                                    
              systemctl restart systemd-journald                                                           
      }                                                                                                    
                                                                                                           
      check_coredump()                                                                                     
      {                                                                                                    
              rlRun "coredumpctl list" "1"                                                                                                                                                                      
              [ $? -ne 1 ] && rlRun "coredumpctl -q list | grep present | awk '{print \$5}' | xargs  coredumpctl info"
              for file in `ls /var/lib/systemd/coredump/`                                                  
              do                                                                                           
                      if echo $file | grep -E "ovs|ovn"                                                    
                      then                                                                                 
                              rlFileSubmit /var/lib/systemd/coredump/$file                                 
                      fi                                                                                   
              done                                                                                         
              rm -rf /var/lib/systemd/coredump/*                                                           
              rm -rf /run/log/journal/*                                                                    
              rm -rf /var/log/journal/*                                                                    
              systemctl restart systemd-journald                                                           
              rlRun "coredumpctl list" "0-255"                                                             
      }                                                                                                    
                                                                                                           
      reset_coredump()                                                                                     
      {                                                                                                    
              rm -rf /var/lib/systemd/coredump/*                                                           
              rm -rf /run/log/journal/*                                                                    
              rm -rf /var/log/journal/*                                                                    
              systemctl restart systemd-journald                                                           
              ulimit -s 8192                                                                               
              sysctl -w fs.suid_dumpable=0                                                                 
      }                                                                                                    
       
      reset_coredump
      enable_coredump
      
      
      systemctl start openvswitch
      systemctl start ovn-northd
      ovn-nbctl set-connection ptcp:6641
      ovn-sbctl set-connection ptcp:6642
      ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:127.0.0.1:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=127.0.0.1 external-ids:ovn-evpn-local-ip=20.0.86.25 external-ids:ovn-evpn-vxlan-ports=4789                                                                
      systemctl restart ovn-controller
      ovs-vsctl --if-exists del-br br-nat
      ovs-vsctl add-br br-nat
      ovs-vsctl set open . external_ids:ovn-bridge-mappings=phy:br-nat
      ip link set br-nat up
      #ovs-vsctl add-port br-nat ens1f1np1
      #ip link set ens1f1np1 up
      ovn-nbctl lr-add lr-tenant\
      	-- lrp-add lr-tenant lr-tenant-public 00:00:00:00:00:42 20.0.0.42/8 2002::42/64
      ovn-nbctl ls-add public\
      	-- lr-add lr-bgp\
      	-- set logical_router lr-bgp options:chassis=hv1
      ovn-nbctl\
      	-- lrp-add lr-bgp lr-bgp-pub 00:00:00:00:00:03 20.0.0.2/8 2002::2/64\
      	-- lsp-add public pub-lr-bgp\
      	-- set logical_switch_port pub-lr-bgp type=router\
      	options:router-port=lr-bgp-pub\
      	-- lsp-set-addresses pub-lr-bgp router\
      	-- lsp-add public pub-lr-tenant\
      	-- set logical_switch_port pub-lr-tenant type=router\
      	options:router-port=lr-tenant-public\
      	-- lsp-add public pub-ln\
      	-- lsp-set-type pub-ln localnet\
      	-- lsp-set-addresses pub-ln unknown\
      	-- lsp-set-options pub-ln network_name=phy
      ip link del vrf101
      ip link add vrf101 type vrf table 101
      ip link set vrf101 up
      ovn-nbctl lsp-add public ls1p1\
      	-- lsp-set-addresses ls1p1 "00:00:00:00:01:01 20.0.0.1 2002::1"\
      	-- lsp-add public ls1p2\
      	-- lsp-set-addresses ls1p2 "00:00:00:00:01:02 20.0.0.2 2002::2"
      ovn-nbctl lr-add lr2\
      	-- lrp-add lr2 lr2-pub 00:00:00:00:00:52 20.2.0.52/8 2202::52/64
      ovn-nbctl ls-add public2\
      	-- lsp-add public2 pub2-lr2\
      	-- set logical_switch_port pub2-lr2 type=router\
      	options:router-port=lr2-pub\
      	-- lrp-add lr-bgp lr-bgp-pub2 00:00:00:ff:00:02\
      	-- lsp-add public2 pub2-bgp\
      	-- set logical_switch_port pub2-bgp type=router\
      	options:router-port=lr-bgp-pub2
      ovn-nbctl lsp-add public2 ls2p1\
      	-- lsp-set-addresses ls2p1 "00:00:00:00:02:01 20.2.0.1 2202::1"\
      	-- lsp-add public2 ls2p2\
      	-- lsp-set-addresses ls2p2 "00:00:00:00:02:02 20.2.0.2 2202::2"
      ip link add br-101 type bridge
      ip link set br-101 master vrf101 addrgenmode none
      ip link set br-101 up
      ip link add vxlan-101 type vxlan id 101 dstport 60011 local 20.0.86.25 nolearning
      ip link set dev vxlan-101 up
      ip link set vxlan-101 master br-101
      ip link add name lo-101 type dummy
      ip link set lo-101 master br-101
      ip link set lo-101 up
      ip link del vrf102
      ip link add vrf102 type vrf table 102
      ip link set vrf102 up
      ip link del br-102
      ip link add br-102 type bridge
      ip link set br-102 master vrf102 addrgenmode none
      ip link set br-102 up
      ip link add vxlan-102 type vxlan id 102 dstport 60011 local 20.0.86.25 nolearning
      ip link set dev vxlan-102 up
      ip link set vxlan-102 master br-102
      ip link add name lo-102 type dummy
      ip link set lo-102 master br-102
      ip link set lo-102 up
      ovn-nbctl set logical_switch public other_config:dynamic-routing-vni=101\
      	-- set logical_switch public other_config:dynamic-routing-redistribute=fdb,ip\
      	-- set logical_switch public2 other_config:dynamic-routing-vni=102\
      	-- set logical_switch public2 other_config:dynamic-routing-redistribute=fdb,ip
      sleep 5
      coredumpctl list
      coredumpctl info 

       Expected Behavior: Describe what should happen under normal circumstances.

      no crash

       Observed Behavior: Explain what actually happens.

      + ovn-nbctl set logical_switch public other_config:dynamic-routing-vni=101 -- set logical_switch public other_config:dynamic-routing-redistribute=fdb,ip -- set logical_switch public2 other_config:dynami
      c-routing-vni=102 -- set logical_switch public2 other_config:dynamic-routing-redistribute=fdb,ip
      + sleep 5                                                                                            
      + coredumpctl list                                                                                   
      TIME                            PID UID GID SIG     COREFILE EXE                       SIZE
      Tue 2026-02-03 22:25:14 EST 2153171 991 991 SIGSEGV present  /usr/bin/ovn-controller 590.3K
      Tue 2026-02-03 22:25:14 EST 2153338 991 991 SIGSEGV present  /usr/bin/ovn-controller 581.6K
      Tue 2026-02-03 22:25:15 EST 2153376 991 991 SIGSEGV present  /usr/bin/ovn-controller 583.1K
      Tue 2026-02-03 22:25:15 EST 2153415 991 991 SIGSEGV present  /usr/bin/ovn-controller 583.1K
      Tue 2026-02-03 22:25:16 EST 2153453 991 991 SIGSEGV present  /usr/bin/ovn-controller 582.7K
      + coredumpctl info                                                                                   
                 PID: 2153453 (ovn-controller)                                                             
                 UID: 991 (openvswitch)                                                                    
                 GID: 991 (openvswitch)                                                                    
              Signal: 11 (SEGV)                                                                            
           Timestamp: Tue 2026-02-03 22:25:16 EST (2s ago)                     
        Command Line: ovn-controller unix:/run/openvswitch/db.sock -vconsole:emer -vsyslog:err -vfile:info --user openvswitch:openvswitch --no-chdir --log-file=/var/log/ovn/ovn-controller.log --pidfile=/run/>
          Executable: /usr/bin/ovn-controller
       Control Group: /system.slice/ovn-controller.service
                Unit: ovn-controller.service                                                               
               Slice: system.slice                                                                         
             Boot ID: 9fdf2859fcc14310b443c06588551fd5                                                     
          Machine ID: bbff6a97fd8443838e558877a0c4f780                                                     
            Hostname: wsfd-advnetlab18.anl.eng.rdu2.dc.redhat.com                         
             Storage: /var/lib/systemd/coredump/core.ovn-controller.991.9fdf2859fcc14310b443c06588551fd5.2153453.1770175516000000.zst (present)
        Size on Disk: 582.7K                                                                               
             Message: Process 2153453 (ovn-controller) of user 991 dumped core.
                      Stack trace of thread 2153453:
                      #0  0x0000558165cb2c8f neighbor_get_relevant_port_binding.lto_priv.0 (ovn-controller + 0x4fc8f)
                      #1  0x0000558165cea512 en_neighbor_run.lto_priv.0 (ovn-controller + 0x87512)
                      #2  0x0000558165d2ce20 engine_recompute (ovn-controller + 0xc9e20)
                      #3  0x0000558165d2d175 engine_run (ovn-controller + 0xca175)
                      #4  0x0000558165c8b9aa main (ovn-controller + 0x289aa)
                      #5  0x00007f0c0f4295d0 __libc_start_call_main (libc.so.6 + 0x295d0)
                      #6  0x00007f0c0f429680 __libc_start_main@@GLIBC_2.34 (libc.so.6 + 0x29680)
                      #7  0x0000558165c8e715 _start (ovn-controller + 0x2b715)
                      
                      Stack trace of thread 2153454:
                      #0  0x00007f0c0f501ecf __poll (libc.so.6 + 0x101ecf)
                      #1  0x0000558165da7015 time_poll.constprop.0 (ovn-controller + 0x144015)
                      #2  0x0000558165d75eca poll_block (ovn-controller + 0x112eca)
                      #3  0x0000558165cbaf60 pinctrl_handler.lto_priv.0 (ovn-controller + 0x57f60)
                      #4  0x0000558165d65b1f ovsthread_wrapper (ovn-controller + 0x102b1f)
                      #5  0x00007f0c0f48a19a start_thread (libc.so.6 + 0x8a19a)
                      #6  0x00007f0c0f50f100 __clone3 (libc.so.6 + 0x10f100)
                      
                      Stack trace of thread 2153455:
                      #0  0x00007f0c0f501ecf __poll (libc.so.6 + 0x101ecf)
                      #1  0x0000558165da6fbd time_poll.constprop.0 (ovn-controller + 0x143fbd)
                      #2  0x0000558165d75eca poll_block (ovn-controller + 0x112eca)
                      #3  0x0000558165d662b6 ovsrcu_postpone_thread (ovn-controller + 0x1032b6)
                      #4  0x0000558165d65b1f ovsthread_wrapper (ovn-controller + 0x102b1f)
                      #5  0x00007f0c0f48a19a start_thread (libc.so.6 + 0x8a19a)
                      #6  0x00007f0c0f50f100 __clone3 (libc.so.6 + 0x10f100)
                      
                      Stack trace of thread 2153456:
                      #0  0x00007f0c0f4a3860 sigdescr_np (libc.so.6 + 0xa3860)
                      #1  0x0000558165d76256 poll_block (ovn-controller + 0x113256)
                      #2  0x0000558165d7a215 stopwatch_thread.lto_priv.0 (ovn-controller + 0x117215)
                      #3  0x0000558165d65b1f ovsthread_wrapper (ovn-controller + 0x102b1f)
                      #4  0x00007f0c0f48a19a start_thread (libc.so.6 + 0x8a19a)
                      #5  0x00007f0c0f50f100 __clone3 (libc.so.6 + 0x10f100)
                      
                      Stack trace of thread 2153457:
                      #0  0x00007f0c0f501ecf __poll (libc.so.6 + 0x101ecf)
                      #1  0x0000558165da6fbd time_poll.constprop.0 (ovn-controller + 0x143fbd)
                      #2  0x0000558165d75eca poll_block (ovn-controller + 0x112eca)
                      #3  0x0000558165d05c8d statctrl_thread_handler (ovn-controller + 0xa2c8d)
                      #4  0x0000558165d65b1f ovsthread_wrapper (ovn-controller + 0x102b1f)
                      #5  0x00007f0c0f48a19a start_thread (libc.so.6 + 0x8a19a)
                      #6  0x00007f0c0f50f100 __clone3 (libc.so.6 + 0x10f100)
                      ELF object binary architecture: AMD x86-64 

       Troubleshooting Actions: Outline the steps taken to diagnose or resolve the issue so far.

       

       Logs: If you collected logs please provide them (e.g. sos report, /var/log/openvswitch/* , testpmd console)

      [root@wsfd-advnetlab18 FDP-2767]# rpm -qa | grep -E "openvswitch|ovn"

      openvswitch-selinux-extra-policy-1.0-39.el9fdp.noarch

      openvswitch3.5-3.5.2-63.el9fdp.x86_64

      python3-openvswitch3.5-3.5.2-63.el9fdp.x86_64

      ovn26.03-26.03.0-alpha.317.el9fdp.x86_64

      ovn26.03-central-26.03.0-alpha.317.el9fdp.x86_64

      ovn26.03-host-26.03.0-alpha.317.el9fdp.x86_64

       

      it also exist on ovn26.03-alpha-300.el9

       

      some content in ovn-controller.log:
      2026-02-04T03:25:14.106Z|00036|features|INFO|OVS Feature: sample_action_with_registers, state: supported
      2026-02-04T03:25:14.106Z|00037|main|INFO|OVS feature set changed, force recompute.
      2026-02-04T03:25:14.120Z|00038|features|INFO|OVS Feature: ct_label_flush, state: supported
      2026-02-04T03:25:14.120Z|00039|main|INFO|OVS feature set changed, force recompute.
      2026-02-04T03:25:14.120Z|00040|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: connected
      2026-02-04T03:25:14.120Z|00041|main|INFO|OVS OpenFlow connection reconnected,force recompute.
      2026-02-04T03:25:14.120Z|00042|binding|INFO|Claiming lport pub2-bgp for this chassis.
      2026-02-04T03:25:14.120Z|00043|binding|INFO|Claiming lport lr-bgp-pub for this chassis.
      2026-02-04T03:25:14.120Z|00044|binding|INFO|lr-bgp-pub: Claiming 00:00:00:00:00:03 20.0.0.2/8 2002::2/64
      2026-02-04T03:25:14.120Z|00045|binding|INFO|Claiming lport lr-bgp-pub2 for this chassis.
      2026-02-04T03:25:14.120Z|00046|binding|INFO|lr-bgp-pub2: Claiming 00:00:00:ff:00:02
      2026-02-04T03:25:14.120Z|00047|binding|INFO|Claiming lport pub-lr-bgp for this chassis.
      2026-02-04T03:25:14.120Z|00048|binding|INFO|pub-lr-bgp: Claiming router
      SIGSEGV detected, backtrace:
      ovn-controller(+0xd1aa1)[0x55e0ab719aa1]
      /lib64/libc.so.6(+0x3ebf0)[0x7f5c8d43ebf0]
      ovn-controller(+0x4fc8f)[0x55e0ab697c8f]
      ovn-controller(+0x87512)[0x55e0ab6cf512]
      ovn-controller(+0xc9e20)[0x55e0ab711e20]
      ovn-controller(+0xca175)[0x55e0ab712175]
      ovn-controller(+0x289aa)[0x55e0ab6709aa]
      /lib64/libc.so.6(+0x295d0)[0x7f5c8d4295d0]
      /lib64/libc.so.6(__libc_start_main+0x80)[0x7f5c8d429680]
      ovn-controller(+0x2b715)[0x55e0ab673715]
      2026-02-04T03:25:14.121Z|00003|fatal_signal(ovn_statctrl3)|WARN|terminating with signal 11 (Segmentation fault)
      2026-02-04T03:25:14.719Z|00001|vlog|INFO|opened log file /var/log/ovn/ovn-controller.log
      2026-02-04T03:25:14.725Z|00002|reconnect|INFO|unix:/run/openvswitch/db.sock: connecting...
      2026-02-04T03:25:14.725Z|00003|reconnect|INFO|unix:/run/openvswitch/db.sock: connected
      2026-02-04T03:25:14.733Z|00004|main|INFO|OVN internal version is : [25.09.90-21.7.0-81.11]
      2026-02-04T03:25:14.733Z|00005|main|INFO|OVS IDL reconnected, force recompute.
      2026-02-04T03:25:14.733Z|00006|reconnect|INFO|tcp:127.0.0.1:6642: connecting...
      2026-02-04T03:25:14.733Z|00007|main|INFO|OVNSB IDL reconnected, force recompute.
      2026-02-04T03:25:14.733Z|00008|ovn_util|INFO|statctrl: connecting to switch: "unix:/run/openvswitch/br-int.mgmt"
      2026-02-04T03:25:14.733Z|00009|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: connecting...
      2026-02-04T03:25:14.733Z|00010|ovn_util|INFO|pinctrl: connecting to switch: "unix:/run/openvswitch/br-int.mgmt"
      2026-02-04T03:25:14.733Z|00011|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: connecting...
      2026-02-04T03:25:14.734Z|00012|reconnect|INFO|tcp:127.0.0.1:6642: connected
      2026-02-04T03:25:14.734Z|00001|rconn(ovn_statctrl3)|INFO|unix:/run/openvswitch/br-int.mgmt: connected
      2026-02-04T03:25:14.734Z|00001|rconn(ovn_pinctrl0)|INFO|unix:/run/openvswitch/br-int.mgmt: connected
      2026-02-04T03:25:14.741Z|00013|ovn_util|INFO|features: connecting to switch: "unix:/run/openvswitch/br-int.mgmt"
      2026-02-04T03:25:14.741Z|00014|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: connecting...
      2026-02-04T03:25:14.741Z|00015|features|INFO|OVS Feature: ct_zero_snat, state: supported
      2026-02-04T03:25:14.741Z|00016|features|INFO|OVS Feature: ct_flush, state: supported
      2026-02-04T03:25:14.741Z|00017|features|INFO|OVS Feature: dp_hash_l4_sym_support, state: supported
      2026-02-04T03:25:14.741Z|00018|main|INFO|OVS feature set changed, force recompute.
      2026-02-04T03:25:14.741Z|00019|ovn_util|INFO|ofctrl: connecting to switch: "unix:/run/openvswitch/br-int.mgmt"
      2026-02-04T03:25:14.741Z|00020|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: connecting...
      2026-02-04T03:25:14.742Z|00021|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: connected
      2026-02-04T03:25:14.742Z|00022|features|INFO|OVS Feature: meter_support, state: supported
      2026-02-04T03:25:14.742Z|00023|features|INFO|OVS Feature: group_support, state: supported
      2026-02-04T03:25:14.742Z|00024|main|INFO|OVS feature set changed, force recompute.
      2026-02-04T03:25:14.742Z|00025|features|INFO|OVS Feature: sample_action_with_registers, state: supported
      2026-02-04T03:25:14.742Z|00026|main|INFO|OVS feature set changed, force recompute.
      2026-02-04T03:25:14.756Z|00027|features|INFO|OVS Feature: ct_label_flush, state: supported
      2026-02-04T03:25:14.756Z|00028|main|INFO|OVS feature set changed, force recompute.
      2026-02-04T03:25:14.756Z|00029|rconn|INFO|unix:/run/openvswitch/br-int.mgmt: connected
      2026-02-04T03:25:14.756Z|00030|main|INFO|OVS OpenFlow connection reconnected,force recompute.
      2026-02-04T03:25:14.760Z|00031|binding|INFO|Claiming lport pub2-bgp for this chassis.
      2026-02-04T03:25:14.760Z|00032|binding|INFO|Claiming lport lr-bgp-pub for this chassis.
      2026-02-04T03:25:14.760Z|00033|binding|INFO|lr-bgp-pub: Claiming 00:00:00:00:00:03 20.0.0.2/8 2002::2/64
      2026-02-04T03:25:14.760Z|00034|binding|INFO|Claiming lport lr-bgp-pub2 for this chassis.
      2026-02-04T03:25:14.760Z|00035|binding|INFO|lr-bgp-pub2: Claiming 00:00:00:ff:00:02
      2026-02-04T03:25:14.760Z|00036|binding|INFO|Claiming lport pub-lr-bgp for this chassis.
      2026-02-04T03:25:14.760Z|00037|binding|INFO|pub-lr-bgp: Claiming router
      SIGSEGV detected, backtrace:
      ovn-controller(+0xd1aa1)[0x55a8d8e1faa1]
      /lib64/libc.so.6(+0x3ebf0)[0x7fd5b623ebf0]
      ovn-controller(+0x4fc8f)[0x55a8d8d9dc8f]
      ovn-controller(+0x87512)[0x55a8d8dd5512]
      ovn-controller(+0xc9e20)[0x55a8d8e17e20]
      ovn-controller(+0xca175)[0x55a8d8e18175]
      ovn-controller(+0x289aa)[0x55a8d8d769aa]
      /lib64/libc.so.6(+0x295d0)[0x7fd5b62295d0]
      /lib64/libc.so.6(__libc_start_main+0x80)[0x7fd5b6229680]
      ovn-controller(+0x2b715)[0x55a8d8d79715]
      2026-02-04T03:25:15.211Z|00001|vlog|INFO|opened log file /var/log/ovn/ovn-controller.log
      2026-02-04T03:25:15.217Z|00002|reconnect|INFO|unix:/run/openvswitch/db.sock: connecting...
      2026-02-04T03:25:15.217Z|00003|reconnect|INFO|unix:/run/openvswitch/db.sock: connected
      2026-02-04T03:25:15.226Z|00004|main|INFO|OVN internal version is : [25.09.90-21.7.0-81.11]

              amusil@redhat.com Ales Musil
              rhn-support-jishi Jianlin Shi
              OVN QE OVN QE
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: