Uploaded image for project: 'Fast Datapath Product'
  1. Fast Datapath Product
  2. FDP-2136

Test Coverage: OVN controller with hostname in ovn-remote stuck in connecting state with OVN SB db restarts

      This task is tracking the test case writing activities to cover the bug described below.

      Problem Description:

      Originally reported in RHOSO 18 https://issues.redhat.com/browse/OSPRH-21332 when moved to OVN 25.03(from 24.03) and jobs started to fail during update as OVN controller stuck into connecting state[1] after the restart of OVN SB DBs.

       

      Related slack thread in #ovn https://redhat-internal.slack.com/archives/C01G7T6SYSD/p1761662714544639

      As pointed in thread It's a regression introduced in OVN 24.09 with https://github.com/ovn-org/ovn/commit/762ae66cd70efa149d91d35305fcef0040e9addd

       [1]

       

      2025-10-27T11:37:08Z|00105|stream_ssl|ERR|ssl:ovsdbserver-sb-2.openstack.svc.cluster.local:6642: connect: Address family not supported by protocol 2025-10-27T11:37:16Z|00106|stream_ssl|ERR|ssl:ovsdbserver-sb-1.openstack.svc.cluster.local:6642: connect: Address family not supported by protocol 2025-10-27T11:37:24Z|00107|stream_ssl|ERR|ssl:ovsdbserver-sb-0.openstack.svc.cluster.local:6642: connect: Address family not supported by protocol 

       

       

       Impact Assessment: OVN controller stuck in connecting state

       

       Software Versions: Specify the exact versions in use

      ovn25.03-25.03.1-60.el9fdp
      openvswitch3.5-3.5.2-51.el9fdp
       

      Issue Type: It's a regression introduced in OVN 24.09

      Reproducibility: Always with the give scenario(multiple SB DB hostnames in external_ids:ovn-remote), not seen the issue with 1 replica

       

       Reproduction Steps:

      • Setup a 3 node OVN Raft cluster
      • Configure OVN controller with external-ids:ovn-remote=<all three ovn sb db servers with hostname(with IPs issue do not reproduce)>
      • Restart all OVN SB DBs
      • Check OVN controller logs and observe it get's stuck into to connecting state

       

      Can also be reproduced by using https://github.com/ovn-org/ovn-fake-multinode

      Would require https://github.com/ovn-org/ovn-fake-multinode/pull/110

      Setup env:-

       

      git clone https://github.com/ovn-org/ovn/ ~/ovn -b branch-25.03
      git clone https://github.com/openvswitch/ovs/ ~/ovs -b branch-3.5
      sudo OVN_SRC_PATH=${HOME}/ovn OVS_SRC_PATH=${HOME}/ovs ./ovn_cluster.sh build
      sudo CENTRAL_IC_ID=ovn-central-az1-1 OVN_DB_CLUSTER=yes bash ./ovn_cluster.sh start
      

      Configure OVN controller

      sudo podman exec -it ovn-chassis-1 bash
      
      # ensure ovn-cluster-az1-1.example.com, ovn-cluster-az1-2.example.com and ovn-cluster-az1-3.example.com are configured on dns server with ips 170.168.0.2, 170.168.0.3 and 170.168.0.4 respectively and then configure ovn-remote as below:-
      
      
      ovs-vsctl set open . external_ids:ovn-remote="ssl:ovn-cluster-az1-1.example.com:6642,ssl:ovn-cluster-az1-2.example.com:6642,ssl:ovn-cluster-az1-3.example.com:6642"
      
      # Ensure it is connected to SB db using hostname by tail -f /var/log/ovn/ovn-controller.log or ovn-appctl connection-status
      
      # can also confirm with ovn-sbctl if these hostnames working using:-
      
      ovn-sbctl --db="ssl:ovn-cluster-az1-1.example.com:6642,ssl:ovn-cluster-az1-2.example.com:6642,ssl:ovn-cluster-az1-3.example.com:6642" --private-key=/opt/ovn/ovn-privkey.pem --certificate=/opt/ovn/ovn-cert.pem --ca-cert=/opt/ovn/pki/switchca/cacert.pem show

      Kill SB DB on ovn central containers ovn-central-az1-1, ovn-central-az1-2 and ovn-central-az1-3

       

      # ps -eaf|grep ovsdb-server-sb|grep -v grep
      root         941     940  0 10:51 ?        00:00:00 ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/ovn/ovsdb-server-sb.log --pidfile=/var/run/ovn/ovnsb_db.pid --remote=punix:/var/run/ovn/ovnsb_db.sock --unixctl=/var/run/ovn/ovnsb_db.ctl --detach --monitor --remote=db:OVN_Southbound,SB_Global,connections --private-key=/opt/ovn/ovn-privkey.pem --certificate=/opt/ovn/ovn-cert.pem --ca-cert=/opt/ovn/pki/switchca/cacert.pem --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers /etc/ovn/ovnsb_db.db
      
      kill -9 941

      Start back the SB DB in all three central containers

      ovsdb-server -vconsole:off -vfile:info --log-file=/var/log/ovn/ovsdb-server-sb.log --pidfile=/var/run/ovn/ovnsb_db.pid --remote=punix:/var/run/ovn/ovnsb_db.sock --unixctl=/var/run/ovn/ovnsb_db.ctl --detach --monitor --remote=db:OVN_Southbound,SB_Global,connections --private-key=/opt/ovn/ovn-privkey.pem --certificate=/opt/ovn/ovn-cert.pem --ca-cert=/opt/ovn/pki/switchca/cacert.pem --ssl-protocols=db:OVN_Southbound,SSL,ssl_protocols --ssl-ciphers=db:OVN_Southbound,SSL,ssl_ciphers /etc/ovn/ovnsb_db.db 

      Check ovn-controller logs in container ovn-chassis-1,

      sudo podman exec -it ovn-chassis-1 bash
      tail -f /var/log/ovn/ovn-controller.log 
      
      With buggy version it will be stuck like:-
      2025-10-30T10:53:36.035Z|00234|stream_ssl|ERR|ssl:ovn-cluster-az1-1.example.com:6642: connect: Address family not supported by protocol
      2025-10-30T10:53:44.042Z|00235|stream_ssl|ERR|ssl:ovn-cluster-az1-2.example.com:6642: connect: Address family not supported by protocol
      2025-10-30T10:53:52.046Z|00236|stream_ssl|ERR|ssl:ovn-cluster-az1-3.example.com:6642: connect: Address family not supported by protocol
      2025-10-30T10:54:00.055Z|00237|stream_ssl|ERR|ssl:ovn-cluster-az1-1.example.com:6642: connect: Address family not supported by protocol
      2025-10-30T10:54:08.063Z|00238|stream_ssl|ERR|ssl:ovn-cluster-az1-2.example.com:6642: connect: Address family not supported by protocol
      2025-10-30T10:54:16.072Z|00239|stream_ssl|ERR|ssl:ovn-cluster-az1-3.example.com:6642: connect: Address family not supported by protocol
      2025-10-30T10:54:24.079Z|00240|stream_ssl|ERR|ssl:ovn-cluster-az1-1.example.com:6642: connect: Address family not supported by protocol
      2025-10-30T10:54:32.088Z|00241|stream_ssl|ERR|ssl:ovn-cluster-az1-2.example.com:6642: connect: Address family not supported by protocol
      2025-10-30T10:54:40.096Z|00242|stream_ssl|ERR|ssl:ovn-cluster-az1-3.example.com:6642: connect: Address family not supported by protocol
      2025-10-30T10:54:48.104Z|00243|stream_ssl|ERR|ssl:ovn-cluster-az1-1.example.com:6642: connect: Address family not supported by protocol
      2025-10-30T10:54:56.106Z|00244|stream_ssl|ERR|ssl:ovn-cluster-az1-2.example.com:6642: connect: Address family not supported by protocol
      2025-10-30T10:55:04.115Z|00245|stream_ssl|ERR|ssl:ovn-cluster-az1-3.example.com:6642: connect: Address family not supported by protocol
      2025-10-30T10:55:12.123Z|00246|stream_ssl|ERR|ssl:ovn-cluster-az1-1.example.com:6642: connect: Address family not supported by protocol

       

       

       

       

       Expected Behavior: When SB db server comes back OVN controller should be able to reconnect immediately as previous releases.

       

       Observed Behavior: OVN controller stuck into connecting state until is restarted or ovn-remote is updated.

       Troubleshooting Actions: Outline the steps taken to diagnose or resolve the issue so far.

      More details in 

      https://issues.redhat.com/browse/OSPRH-21332 and slack thread  https://redhat-internal.slack.com/archives/C01G7T6SYSD/p1761662714544639

              ovnteam@redhat.com OVN Team
              nstbot NST Bot
              OVN QE OVN QE
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: