Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-715

ovn-controller crashes after connecting to an empty local database

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • FDP-25.A
    • None
    • ovn24.03
    • 3
    • False
    • Hide

      None

      Show
      None
    • False
    • Hide

      Given an empty local database has been created and the system administrator runs ovn-controller,

      When ovn-controller connects to this empty local database,

      Then, ovn-controller should not crash but instead continue running without applying further configurations until the database is initialized.

      Show
      Given an empty local database has been created and the system administrator runs ovn-controller, When ovn-controller connects to this empty local database, Then, ovn-controller should not crash but instead continue running without applying further configurations until the database is initialized.
    • rhel-sst-network-fastdatapath
    • ssg_networking
    • FDP 24.G, FDP 24.H, FDP 25.A
    • Important

      After connecting to a freshly created local database, ovn-controller crashes.

      # cat reproducer.sh 
      #!/bin/bash
      
      set -x
      
      DIR=/tmp/test-dir
      
      cleanup () {
          if test -f ${DIR}/ovsdb-server.pid; then
              kill $(cat ${DIR}/ovsdb-server.pid) || true
          fi
      }
      trap cleanup 0 1 2 3 13 14 15
      
      rm -rf ${DIR}
      mkdir ${DIR}
      
      export OVS_RUNDIR=${DIR}
      export OVS_LOGDIR=${DIR}
      export OVN_RUNDIR=${DIR}
      export OVN_LOGDIR=${DIR}
      
      ovsdb-tool create ${DIR}/conf.db /usr/share/openvswitch/vswitch.ovsschema
      ovsdb-server --detach --no-chdir --pidfile --log-file \
                   -vconsole:off -vsyslog:off \
                   --remote=punix:${DIR}/db.sock ${DIR}/conf.db
      #ovs-vsctl --db=unix:${DIR}/db.sock --no-wait init
      
      gdb --args ovn-controller --no-chdir --log-file \
                                -vsyslog:off -vconsole:info \
                                unix:${DIR}/db.sock <<< "
      run
      backtrace
      frame function main
      echo print cfg\n
      print cfg
      quit
      y
      "
      

      Result:

      (gdb) Starting program: /usr/bin/ovn-controller --no-chdir --log-file -vsyslog:off -vconsole:info unix:/tmp/test-dir/db.sock
      
      2024-07-23T19:59:38Z|00001|vlog|INFO|opened log file /tmp/test-dir/ovn-controller.log
      [New Thread 0x7ffff6f676c0 (LWP 2947)]
      [New Thread 0x7ffff5f656c0 (LWP 2949)]
      [New Thread 0x7ffff67666c0 (LWP 2948)]
      2024-07-23T19:59:38Z|00002|reconnect|INFO|unix:/tmp/test-dir/db.sock: connecting...
      2024-07-23T19:59:38Z|00003|reconnect|INFO|unix:/tmp/test-dir/db.sock: connected
      [New Thread 0x7ffff57236c0 (LWP 2950)]
      2024-07-23T19:59:38Z|00004|main|INFO|OVN internal version is : [24.03.90-20.34.0-73.6]
      2024-07-23T19:59:38Z|00005|main|INFO|OVS IDL reconnected, force recompute.
      2024-07-23T19:59:38Z|00006|main|INFO|OVNSB IDL reconnected, force recompute.
      2024-07-23T19:59:38Z|00007|chassis|WARN|'system-id' in Open_vSwitch database is missing.
      
      Thread 1 "ovn-controller" received signal SIGSEGV, Segmentation fault.
      shash_find__ (sh=0x180, name=0x5555556903cb "vlan-limit", name_len=10, hash=777389702)
          at ovs-bf1b16364b3f01b0ff5f2f6e76842e666226a17b/lib/shash.c:225
      225         HMAP_FOR_EACH_WITH_HASH (node, node, hash, &sh->map) {
      (gdb) #0  shash_find__ (sh=0x180, name=0x5555556903cb "vlan-limit", name_len=10, hash=777389702)
          at ovs-bf1b16364b3f01b0ff5f2f6e76842e666226a17b/lib/shash.c:225
      #1  0x00005555556328a1 in smap_get_node (smap=0x180, key=0x5555556903cb "vlan-limit")
          at ovs-bf1b16364b3f01b0ff5f2f6e76842e666226a17b/lib/smap.c:217
      #2  smap_get_def (smap=0x180, key=0x5555556903cb "vlan-limit", def=0x0)
          at ovs-bf1b16364b3f01b0ff5f2f6e76842e666226a17b/lib/smap.c:208
      #3  smap_get (smap=0x180, key=0x5555556903cb "vlan-limit") at ovs-bf1b16364b3f01b0ff5f2f6e76842e666226a17b/lib/smap.c:200
      #4  smap_get_int (smap=0x180, key=0x5555556903cb "vlan-limit", def=-1)
          at ovs-bf1b16364b3f01b0ff5f2f6e76842e666226a17b/lib/smap.c:240
      #5  0x0000555555560003 in main (argc=<optimized out>, argv=<optimized out>) at controller/ovn-controller.c:5430
      (gdb) #5  0x0000555555560003 in main (argc=<optimized out>, argv=<optimized out>) at controller/ovn-controller.c:5430
      5430                int vlan_limit = smap_get_int(
      (gdb) print cfg
      (gdb) $1 = (const struct ovsrec_open_vswitch *) 0x0
      

       

      Uncommenting the init line in the reproducer script makes it not crash.

      This should not be a frequent event as local databases are usually not empty, but that can technically happen, and crashing is never a correct behavior.

              roriorde@redhat.com Rosemarie O'Riorden
              roriorde@redhat.com Rosemarie O'Riorden
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: