-
Bug
-
Resolution: Done
-
Major
-
rhos-17.1.z
-
None
-
3
-
False
-
-
False
-
?
-
openstack-neutron-18.6.1-17.1.20250529181015.85ff760.el9osttrunk
-
None
-
-
-
Neutron Sprint 13, Neutron Sprint 14, Neutron Sprint 15, Neutron Sprint 16
-
4
-
Important
To Reproduce Steps to reproduce the behavior:
It is unclear if this problem is directly related to HF for https://issues.redhat.com/browse/OSPRH-14377, or a separate problem. But it is important to say that problem happens in RHOSP 17.1 environment with hot fix RPMs installed inside neutron-server container
When creating huge number of VMs, one of them may fail because of of the following error returned by Neutron server for request to bind port:
2025-04-17 04:52:01.005 28 ERROR neutron.plugins.ml2.managers [req-0972e1f6-47a6-4850-828c-b8d3b0ec4561 ID ID - default default] Mechanism driver 'ovn' failed in update_port_postcommit: ovsdbapp.backend.ovs_idl.i dlutils.RowNotFound: Cannot find Logical_Switch_Port with name=PORT_UUID 2025-04-17 04:52:01.005 28 ERROR neutron.plugins.ml2.managers Traceback (most recent call last): 2025-04-17 04:52:01.005 28 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3.9/site-packages/neutron/plugins/ml2/managers.py", line 493, in _call_on_drivers 2025-04-17 04:52:01.005 28 ERROR neutron.plugins.ml2.managers getattr(driver.obj, method_name)(context) 2025-04-17 04:52:01.005 28 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3.9/site-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 873, in update_port_postcommit 2025-04-17 04:52:01.005 28 ERROR neutron.plugins.ml2.managers self._ovn_update_port(context._plugin_context, port, original_port, 2025-04-17 04:52:01.005 28 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3.9/site-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/mech_driver.py", line 755, in _ovn_update_port 2025-04-17 04:52:01.005 28 ERROR neutron.plugins.ml2.managers self._ovn_client.update_port(plugin_context, port, 2025-04-17 04:52:01.005 28 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3.9/site-packages/neutron/plugins/ml2/drivers/ovn/mech_driver/ovsdb/ovn_client.py", line 809, in update_port 2025-04-17 04:52:01.005 28 ERROR neutron.plugins.ml2.managers ovn_port = self._nb_idl.lookup('Logical_Switch_Port', port['id']) 2025-04-17 04:52:01.005 28 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3.9/site-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 208, in lookup 2025-04-17 04:52:01.005 28 ERROR neutron.plugins.ml2.managers return self._lookup(table, record) 2025-04-17 04:52:01.005 28 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3.9/site-packages/ovsdbapp/backend/ovs_idl/__init__.py", line 268, in _lookup 2025-04-17 04:52:01.005 28 ERROR neutron.plugins.ml2.managers row = idlutils.row_by_value(self, rl.table, rl.column, record) 2025-04-17 04:52:01.005 28 ERROR neutron.plugins.ml2.managers File "/usr/lib/python3.9/site-packages/ovsdbapp/backend/ovs_idl/idlutils.py", line 114, in row_by_value 2025-04-17 04:52:01.005 28 ERROR neutron.plugins.ml2.managers raise RowNotFound(table=table, col=column, match=match) 2025-04-17 04:52:01.005 28 ERROR neutron.plugins.ml2.managers ovsdbapp.backend.ovs_idl.idlutils.RowNotFound: Cannot find Logical_Switch_Port with name=PORT_UUID 2025-04-17 04:52:01.005 28 ERROR neutron.plugins.ml2.managers
Before trace is reported I can see multiple INFO messages indicating connection problems for OVN NB DB server:
2025-04-17 04:52:01.004 26 INFO ovsdbapp.backend.ovs_idl.vlog [req-efa26f5d-533d-438b-aac4-01d4a94f24fb - - - - -] tcp:IP:6641: waiting 2 seconds before reconnect 2025-04-17 04:52:01.006 30 INFO ovsdbapp.backend.ovs_idl.vlog [req-dc88b48c-ea5a-4c10-ba24-41008df1ca16 - - - - -] tcp:IP:6641: connection closed by client
Affected port was successfully created shortly before on another controller, so I am not sure if Neutron server did its job properly here.
Expected behavior
Actual solutions may be very different, but in the end if OVN NB DB is not available, Neutron Server should probably report this instead of raising RowNotFound + we may consider changing failover logic if possible.
Bug impact
Some logic to process failovers must be implemented by customer
Known workaround
None