-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
None
-
None
-
False
-
-
False
-
None
-
-
Description of problem:
Noticed the issue in ovn perf job[1]. Fails as below:-
Traceback (most recent call last):
File "/home/stack/browbeat/.rally-venv/lib64/python3.9/site-packages/rally/task/runner.py", line 69, in _run_scenario_once
getattr(scenario_inst, method_name)(**scenario_kwargs)
File "rally/rally-plugins/netcreate-boot/netcreate_nova_boot_fip_ping.py", line 49, in run
guest = self._boot_server_with_fip(image, flavor, True,
File "/home/stack/browbeat/.rally-venv/lib64/python3.9/site-packages/rally_openstack/task/scenarios/vm/utils.py", line 144, in _boot_server_with_fip
server = self._boot_server(image, flavor, **kwargs)
File "/home/stack/browbeat/.rally-venv/lib64/python3.9/site-packages/rally_openstack/task/scenarios/nova/utils.py", line 107, in _boot_server
server = utils.wait_for_status(
File "/home/stack/browbeat/.rally-venv/lib64/python3.9/site-packages/rally/task/utils.py", line 245, in wait_for_status
raise exceptions.TimeoutException(
rally.exceptions.TimeoutException: Rally tired waiting 300.00 seconds for Server s_rally_9dd8b1b4_8pNZItaq:56f8ecc8-5d71-4664-a20b-9e832319af51 to become ('ACTIVE') current status BUILD
It timed out waiting for network-vif-plugged.
network-vif-plugged is not triggered as neutron port is not transitioned from down to up.
Ideally Event LogicalSwitchPortUpdateUpEvent is responsible for setting it up, but that only works for Update Event[2].
But since the DB is reconnected it get's the Create Event(with up=True) instead of the Create(without up set) followed by Update(up=True).
There also exist another event LogicalSwitchPortCreateUpEvent which works for the case LSP Create(with up=True)[3] but that only get's triggered during startup after that it's disabled[4]
Port created in Northbound(checked ovsdb-tool show-log):-
record 591:
term: 4
index: 17274
eid: 3c41
2023-06-09 00:11:20.877
...
table Logical_Switch_Port insert row "abad941a-6790-4cc2-b89b-67901247e684" (f71b000d):
name="abad941a-6790-4cc2-b89b-67901247e684"
port_security=["fa:16:3e:f1:77:d4 10.2.5.211"]
addresses=["fa:16:3e:f1:77:d4 10.2.5.211"]
options=
external_ids=
{"neutron:cidrs"="10.2.5.211/24", "neutron:device_id"="56f8ecc8-5d71-4664-a20b-9e832319af51", "neutron:device_owner"="", "neutron:network_name"=neutron-7812be3a-bbaa-4e94-b992-e1f51a518663, "neutron:port_capabilities"="", "neutron:port_name"="", "neutron:project_id"="920a1eca08984537a1c8e5d79778db4b", "neutron:revision_number"="1", "neutron:security_group_ids"="81a5fbaf-935a-4352-9bfc-05d99c0e8457", "neutron:subnet_pool_addr_scope4"="", "neutron:subnet_pool_addr_scope6"="", "neutron:vnic_type"=normal}Port is up in Southbound:-
2023-06-09T00:11:22.445Z|01066|binding|INFO|Setting lport abad941a-6790-4cc2-b89b-67901247e684 up in Southbound
Port is up in Northbound(checked ovsdb-tool show-log):-
record 605:
term: 4
index: 17281
eid: b3f8
2023-06-09 00:11:22.458 "ovn-northd"
table Logical_Switch_Port row "abad941a-6790-4cc2-b89b-67901247e684" (f71b000d) diff:
up=true
And for the duration(LSP create and LSP up in Northboud) worker is disconnected with db
$ $ grep -nr :6641 ct1|grep "2023-06-09 00:11:2"|grep req-c4dc1e1b-e561-4fc6-9a55-38079f2fb048
ct1/server.log.1:13047:2023-06-09 00:11:20.856 24 INFO ovsdbapp.backend.ovs_idl.vlog [req-c4dc1e1b-e561-4fc6-9a55-38079f2fb048 - - - - -] tcp:172.17.1.75:6641: connecting...
ct1/server.log.1:13049:2023-06-09 00:11:20.857 24 INFO ovsdbapp.backend.ovs_idl.vlog [req-c4dc1e1b-e561-4fc6-9a55-38079f2fb048 - - - - -] tcp:172.17.1.75:6641: connected
ct1/server.log.1:13060:2023-06-09 00:11:20.861 24 INFO ovsdbapp.backend.ovs_idl.vlog [req-c4dc1e1b-e561-4fc6-9a55-38079f2fb048 - - - - -] tcp:172.17.1.75:6641: clustered database server is not cluster leader; trying another server
ct1/server.log.1:13063:2023-06-09 00:11:20.862 24 INFO ovsdbapp.backend.ovs_idl.vlog [req-c4dc1e1b-e561-4fc6-9a55-38079f2fb048 - - - - -] tcp:172.17.1.75:6641: connection closed by client
ct1/server.log.1:13064:2023-06-09 00:11:20.862 24 INFO ovsdbapp.backend.ovs_idl.vlog [req-c4dc1e1b-e561-4fc6-9a55-38079f2fb048 - - - - -] tcp:172.17.1.75:6641: waiting 2 seconds before reconnect
ct1/server.log.1:13130:2023-06-09 00:11:22.863 24 INFO ovsdbapp.backend.ovs_idl.vlog [req-c4dc1e1b-e561-4fc6-9a55-38079f2fb048 - - - - -] tcp:172.17.1.134:6641: connecting...
ct1/server.log.1:13131:2023-06-09 00:11:22.864 24 INFO ovsdbapp.backend.ovs_idl.vlog [req-c4dc1e1b-e561-4fc6-9a55-38079f2fb048 - - - - -] tcp:172.17.1.134:6641: connected
- Events for the LSP record(all happened after db is reconnected)
$ grep -nr f71b000d ct1
ct1/server.log.1:13138:2023-06-09 00:11:22.879 24 DEBUG neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovsdb_monitor [req-c4dc1e1b-e561-4fc6-9a55-38079f2fb048 - - - - -] Hash Ring: Node aadce024-cc88-4641-a943-b058e228cf07 (host: controller-1.redhat.local) handling event "create" for row f71b000d-7a8c-45d5-8247-6c723e8ecfe1 (table: Logical_Switch_Port) notify /usr/lib/python3.9/site-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py:769
ct1/server.log.1:13751:2023-06-09 00:12:35.175 24 DEBUG neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovsdb_monitor [req-c4dc1e1b-e561-4fc6-9a55-38079f2fb048 - - - - -] Hash Ring: Node aadce024-cc88-4641-a943-b058e228cf07 (host: controller-1.redhat.local) handling event "update" for row f71b000d-7a8c-45d5-8247-6c723e8ecfe1 (table: Logical_Switch_Port) notify /usr/lib/python3.9/site-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py:769
ct1/server.log.1:14693:2023-06-09 00:16:23.110 24 DEBUG neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovsdb_monitor [req-c4dc1e1b-e561-4fc6-9a55-38079f2fb048 - - - - -] Hash Ring: Node aadce024-cc88-4641-a943-b058e228cf07 (host: controller-1.redhat.local) handling event "update" for row f71b000d-7a8c-45d5-8247-6c723e8ecfe1 (table: Logical_Switch_Port) notify /usr/lib/python3.9/site-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py:769
ct1/server.log.1:14707:2023-06-09 00:16:24.035 24 DEBUG neutron.plugins.ml2.drivers.ovn.mech_driver.ovsdb.ovsdb_monitor [req-c4dc1e1b-e561-4fc6-9a55-38079f2fb048 - - - - -] Hash Ring: Node aadce024-cc88-4641-a943-b058e228cf07 (host: controller-1.redhat.local) handling event "delete" for row f71b000d-7a8c-45d5-8247-6c723e8ecfe1 (table: Logical_Switch_Port) notify /usr/lib/python3.9/site-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py:769
[1] https://rhos-ci-jenkins.lab.eng.tlv2.redhat.com/job/DFG-perfscale-PerfCI-OSP17.1-neutron-ovn/56/artifact/test_results/rally-results.html#/BrowbeatPlugin.create_network_nova_boot_ping-2/failures
[2] https://github.com/openstack/neutron/blob/e941180b68659a53c3c68705915a92552d3fe25c/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py#L443
[3] https://github.com/openstack/neutron/blob/e941180b68659a53c3c68705915a92552d3fe25c/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py#L394-L411
[4] https://github.com/openstack/neutron/blob/e941180b68659a53c3c68705915a92552d3fe25c/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovsdb_monitor.py#L748-L763
How reproducible: Random issue, can happen if event watcher get's disconnected(due to raft leadership transfer) for the time between lsp is created and marked up in NorthBound(via northd).
Actual results:
Port is not up in Neutron(Even it's up in OVN DB) and cause vm creation failure
Expected results:
VM Creation should succeed