Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-26890

[RHOSO] L3 high availability failover not working between OCPs

XMLWordPrintable

    • Important

      `whitebox-neutron-tempest-plugin` contains L3HA test triggering failover by taking external interface down on gateway chassis, then verifies traffic goes alternatively on next highest priority chassis [1].

      Until today was tested in CI on networker nodes, but when networker nodes disabled as gateway [2], attempted to test only failover between OCP nodes, after adding test support to do so [3] since interfaces mapped to `ovs_pod` per OCP node requiring new test functionality.

      The test verification done on uni-eta, failed with the following traceback [4] and logs [5], suggesting possible regression exists with failover between OCPs, while same new test method code adjusted passes when networkers are prioritized for failover (before [2] commands on networker nodes).

      Possibly OCP nodes require more configuration adjustments to support BFD or failover process in a different way.

       

      [1] Existing Test Code
      https://opendev.org/x/whitebox-neutron-tempest-plugin/src/commit/c8df8bf83b/whitebox_neutron_tempest_plugin/tests/scenario/test_l3ha_ovn.py#L234

      [2] Environment State

      BEFORE:
      
      [cloud-admin@openstackclient ~]$ openstack network agent list --agent-type "ovn-controller-gateway"                                                                                                                                                                                            
      +--------------------------------------+------------------------------+-------------------------------------------+-------------------+-------+-------+----------------+                                            
      | ID                                   | Agent Type                   | Host                                      | Availability Zone | Alive | State | Binary         |                                            
      +--------------------------------------+------------------------------+-------------------------------------------+-------------------+-------+-------+----------------+                                            
      | a827c3f7-0d06-4e0f-898e-82f05e712713 | OVN Controller Gateway agent | networker-4cvld6kh-1.ctlplane.example.com |                   | :-)   | UP    | ovn-controller |                                            
      | 994000e8-f1e7-4a0e-b91c-e3cd1178fd5f | OVN Controller Gateway agent | master-2                                  |                   | :-)   | UP    | ovn-controller |                                            
      | 7546596c-b270-4cb2-a2f0-52fb64134ad6 | OVN Controller Gateway agent | networker-4cvld6kh-0.ctlplane.example.com |                   | :-)   | UP    | ovn-controller |                                            
      | 6a1546be-5d30-43c3-be7d-897216c18caf | OVN Controller Gateway agent | master-0                                  |                   | :-)   | UP    | ovn-controller |                                            
      | af327b3f-2f9b-41c1-9fd5-9c9c3d5a1ccb | OVN Controller Gateway agent | master-1                                  |                   | :-)   | UP    | ovn-controller |                                            
      | 8b9abbdc-b37f-44f2-a123-94fddd6a8eed | OVN Controller Gateway agent | networker-4cvld6kh-2.ctlplane.example.com |                   | :-)   | UP    | ovn-controller |                                            
      +--------------------------------------+------------------------------+-------------------------------------------+-------------------+-------+-------+----------------+
      
      [root@networker-4cvld6kh-0 ~]# ovs-vsctl get open . external_ids:ovn-cms-options
      enable-chassis-as-gw
      [root@networker-4cvld6kh-0 ~]# ovs-vsctl set open . external_ids:ovn-cms-options='""'
      [root@networker-4cvld6kh-0 ~]# ovs-vsctl get open . external_ids:ovn-cms-options
      ""
      (^ set on all networker nodes)
      
      AFTER:
      
      [cloud-admin@openstackclient ~]$ openstack network agent list --agent-type "ovn-controller-gateway"
      +--------------------------------------+------------------------------+----------+-------------------+-------+-------+----------------+
      | ID                                   | Agent Type                   | Host     | Availability Zone | Alive | State | Binary         |
      +--------------------------------------+------------------------------+----------+-------------------+-------+-------+----------------+
      | 994000e8-f1e7-4a0e-b91c-e3cd1178fd5f | OVN Controller Gateway agent | master-2 |                   | :-)   | UP    | ovn-controller |
      | 6a1546be-5d30-43c3-be7d-897216c18caf | OVN Controller Gateway agent | master-0 |                   | :-)   | UP    | ovn-controller |
      | af327b3f-2f9b-41c1-9fd5-9c9c3d5a1ccb | OVN Controller Gateway agent | master-1 |                   | :-)   | UP    | ovn-controller |
      +--------------------------------------+------------------------------+----------+-------------------+-------+-------+----------------+
      

      [3] OCPs Testing Support Patch
      977320: Test traffic capture for HA failover from/to OCP/CRC nodes | https://review.opendev.org/c/x/whitebox-neutron-tempest-plugin/+/977320/4

      [4] Traceback

      During handling of the above exception, another exception occurred:
      Traceback (most recent call last):
        File "/usr/lib/python3.9/site-packages/whitebox_neutron_tempest_plugin/tests/scenario/test_l3ha_ovn.py", line 76, in verify_routing_via_chassis
          common_utils.wait_until_true(
        File "/usr/lib/python3.9/site-packages/neutron_tempest_plugin/common/utils.py", line 90, in wait_until_true
          raise WaitTimeout("Timed out after %d seconds" % timeout)
      neutron_tempest_plugin.common.utils.WaitTimeout: Timed out after 60 seconds
      
      During handling of the above exception, another exception occurred:
      Traceback (most recent call last):
        File "/usr/lib/python3.9/site-packages/whitebox_neutron_tempest_plugin/tests/scenario/test_l3ha_ovn.py", line 304, in test_l3ha_bring_down_interface
          self.verify_routing_via_chassis(self.chassis_list[1])
        File "/usr/lib/python3.9/site-packages/whitebox_neutron_tempest_plugin/tests/scenario/test_l3ha_ovn.py", line 80, in verify_routing_via_chassis
          self.fail("Gateway chassis was not updated as expected")
        File "/usr/lib64/python3.9/unittest/case.py", line 676, in fail
          raise self.failureException(msg)
      AssertionError: Gateway chassis was not updated as expected
      

      [5] Logs

      2026-02-23 17:32:14.575 348 DEBUG whitebox_neutron_tempest_plugin.tests.scenario.base [-] Actual routing nodes: master-0 check_north_south_icmp_flow /usr/lib/python3.9/site-packages/whitebox_neutron_tempest_plugin/tests/scenario/base.py:1708
      ...
      2026-02-23 17:32:15.081 348 DEBUG whitebox_neutron_tempest_plugin.tests.scenario.base [-] Command: oc -n openstack  rsh ovn-controller-ovs-qfkw4 ip link set br-datacentre down run_on_master_controller /usr/lib/py
      ...
      2026-02-23 17:32:15.611 348 DEBUG whitebox_neutron_tempest_plugin.tests.scenario.base [-] Command: oc -n openstack  rsh pod/ovsdbserver-sb-0 ovn-sbctl --db="ssl:ovsdbserver-sb-0.openstack.svc.cluster.local:6642,ssl:ovsdbserver-sb-1.openstack.svc.cluster.local:6642,ssl:ovsdbserver-sb-2.openstack.svc.cluster.local:6642" --private-key=/etc/pki/tls/private/ovndb.key --certificate=/etc/pki/tls/certs/ovndb.crt --ca-cert=/etc/pki/tls/certs/ovndbca.crt get chassis 994000e8-f1e7-4a0e-b91c-e3cd1178fd5f hostname run_on_master_controller /usr/lib/python3.9/site-packages/whitebox_neutron_tempest_plugin/tests/scenario/base.py:156
      2026-02-23 17:32:15.611 348 INFO tempest.lib.common.ssh [-] Creating ssh connection to '192.168.111.9:22' as 'zuul' with password None
      2026-02-23 17:32:15.628 348 INFO paramiko.transport [-] Connected (version 2.0, client OpenSSH_8.7)
      2026-02-23 17:32:15.724 348 INFO paramiko.transport [-] Authentication (publickey) successful!
      2026-02-23 17:32:15.724 348 INFO tempest.lib.common.ssh [-] ssh connection to zuul@192.168.111.9 successfully created
      2026-02-23 17:32:16.172 348 DEBUG whitebox_neutron_tempest_plugin.tests.scenario.base [-] Output: master-2
       run_on_master_controller /usr/lib/python3.9/site-packages/whitebox_neutron_tempest_plugin/tests/scenario/base.py:163
      2026-02-23 17:32:16.172 348 DEBUG whitebox_neutron_tempest_plugin.tests.scenario.test_l3ha_ovn [-] Waiting until router gateway chassis is updated verify_routing_via_chassis /usr/lib/python3.9/site-packages/whitebox_neutron_tempest_plugin/tests/scenario/test_l3ha_ovn.py:64
      ...
      2026-02-23 17:32:17.294 348 DEBUG whitebox_neutron_tempest_plugin.tests.scenario.test_l3ha_ovn [-] chassis = 'master-0', expected = master-2  _router_gateway_chassis_updated /usr/lib/python3.9/site-packages/whitebox_neutron_tempest_plugin/tests/scenario/test_l3ha_ovn.py:70
      
      (check attempted with delays in between until timeouts and fails)
      

              Unassigned Unassigned
              rhn-support-mblue Maor Blaustein
              rhos-dfg-networking-squad-neutron
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: