-
Bug
-
Resolution: Done
-
Major
-
rhos-17.1.4
-
8
-
False
-
-
False
-
?
-
openstack-neutron-18.6.1-17.1.20250325141037.85ff760.el9osttrunk
-
None
-
-
-
Important
We have a client trying to create a bunch of instances (100+) and most of them work just fine but sometimes they fail spawning on random compute nodes.
The instances are on openstack and I can't see anything wrong in nova (libvirt) and neutron (ovn) that would explain it.
In the case below other instances were created seconds before with success.
But the issue is intermittent, because for the past 2-3 days they keep creating/deleting instances on the platform and no more issues are observed.
The goal here is to understand why sometimes it failed.
The error:
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [req-ad87de7f-0e0d-4361-a187-dc4ac370b41a 68a9841ac07d4f388a897d6aa8fb6ecc 748a8eb9d8c946cfa99599dd71951ebb - default default] [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] Instanc
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] Traceback (most recent call last):
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 7376, in _create_guest_with_network
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] guest = self._create_guest(
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] File "/usr/lib64/python3.9/contextlib.py", line 126, in _exit_
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] next(self.gen)
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 481, in wait_for_instance_event
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] actual_event = event.wait()
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] File "/usr/lib/python3.9/site-packages/eventlet/event.py", line 125, in wait
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] result = hub.switch()
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] File "/usr/lib/python3.9/site-packages/eventlet/hubs/hub.py", line 313, in switch
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] return self.greenlet.switch()
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] eventlet.timeout.Timeout: 300 seconds
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1]
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] During handling of the above exception, another exception occurred:
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1]
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] Traceback (most recent call last):
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 2754, in _build_resources
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] yield resources
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 2512, in _build_and_run_instance
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] self.driver.spawn(context, instance, image_meta,
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 4284, in spawn
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] self._create_guest_with_network(
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 7402, in _create_guest_with_network
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] raise exception.VirtualInterfaceCreateException()
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1]
The libvirt part:
From /var/log/containers/libvirt/virtqemud.log.2.gz
We see the port being created:
2025-02-25 13:42:20.541+0000: 6505: info : virNetDevTapCreate:258 : created device: 'tapf93216b2-df'
2025-02-25 13:42:20.541+0000: 6505: debug : virNetDevSetMACInternal:282 : SIOCSIFHWADDR tapf93216b2-df MAC=fe:16:3e:50:b1:8c - Success
2025-02-25 13:42:20.541+0000: 6505: debug : virCommandRunAsync:2634 : About to run LC_ALL=C tc qdisc add dev tapf93216b2-df root handle 0: noqueue
But starting here it look stuck in a loop:
2025-02-25 13:42:25.527+0000: 6505: debug : virNodeDeviceLookupByName:193 : conn=0x7f8fd4013390, name=net_tapf93216b2_df_fe_16_3e_50_b1_8c
2025-02-25 13:43:26.487+0000: 6510: debug : virNodeDeviceLookupByName:193 : conn=0x7f8fd4013390, name=net_tapf93216b2_df_fe_16_3e_50_b1_8c
2025-02-25 13:44:28.514+0000: 6505: debug : virNodeDeviceLookupByName:193 : conn=0x7f8fd4013390, name=net_tapf93216b2_df_fe_16_3e_50_b1_8c
2025-02-25 13:45:30.482+0000: 6508: debug : virNodeDeviceLookupByName:193 : conn=0x7f8fd4013390, name=net_tapf93216b2_df_fe_16_3e_50_b1_8c
2025-02-25 13:46:32.474+0000: 6506: debug : virNodeDeviceLookupByName:193 : conn=0x7f8fd4013390, name=net_tapf93216b2_df_fe_16_3e_50_b1_8c
2025-02-25 13:47:21.185+0000: 6511: info : virNetDevTapDelete:311 : delete device: 'tapf93216b2-df'
RCA for why one instance couldn't be spawned: e93fffbf-57cb-437b-ad35-2d286e6687d1
From nova-compute.log:
Here the port gets created successfully:
2025-02-25 13:42:18.928 2 DEBUG nova.compute.manager [req-ad87de7f-0e0d-4361-a187-dc4ac370b41a 68a9841ac07d4f388a897d6aa8fb6ecc 748a8eb9d8c946cfa99599dd71951ebb - default default] [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] Start building networks asynchronously for instance. _build_resources /usr/lib/python3.9/site-packages/nova/compute/manager.py:2678
2025-02-25 13:42:18.966 2 DEBUG nova.compute.manager [req-ad87de7f-0e0d-4361-a187-dc4ac370b41a 68a9841ac07d4f388a897d6aa8fb6ecc 748a8eb9d8c946cfa99599dd71951ebb - default default] [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] Allocating IP information in the background. _allocate_network_async /usr/lib/python3.9/site-packages/nova/compute/manager.py:1844
2025-02-25 13:42:19.970 2 DEBUG nova.network.neutron [req-ad87de7f-0e0d-4361-a187-dc4ac370b41a 68a9841ac07d4f388a897d6aa8fb6ecc 748a8eb9d8c946cfa99599dd71951ebb - default default] [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] Successfully updated port: f93216b2-dfd0-41f4-8cb7-4359727bbfb7 _update_port /usr/lib/python3.9/site-packages/nova/network/neutron.py:587
The instance is creating a port f93216b2-dfd0-41f4-8cb7-4359727bbfb7:
2025-02-25 13:42:19.496 2 DEBUG nova.network.neutron [req-ad87de7f-0e0d-4361-a187-dc4ac370b41a 68a9841ac07d4f388a897d6aa8fb6ecc 748a8eb9d8c946cfa99599dd71951ebb - default default] [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] Successfully created port: f93216b2-dfd0-41f4-8cb7-4359727bbfb7 _create_port_minimal /usr/lib/python3.9/site-packages/nova/network/neutron.py:549
2025-02-25 13:42:20.317 2 DEBUG nova.network.os_vif_util [req-ad87de7f-0e0d-4361-a187-dc4ac370b41a 68a9841ac07d4f388a897d6aa8fb6ecc 748a8eb9d8c946cfa99599dd71951ebb - default default] Converted object VIFOpenVSwitch(active=False,address=fa:16:3e:50:b1:8c,bridge_name='br-int',has_traffic_filtering=True,id=f93216b2-dfd0-41f4-8cb7-4359727bbfb7,network=Network(843ddda0-f77a-49f5-be44-f6770bcbcc9b),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tapf93216b2-df') nova_to_osvif_vif /usr/lib/python3.9/site-packages/nova/network/os_vif_util.py:560
2025-02-25 13:42:20.491 2 DEBUG nova.virt.libvirt.driver [req-ad87de7f-0e0d-4361-a187-dc4ac370b41a 68a9841ac07d4f388a897d6aa8fb6ecc 748a8eb9d8c946cfa99599dd71951ebb - default default] No VIF found with MAC fa:16:3e:50:b1:8c, not building metadata _build_interface_metadata /usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py:11629
The instance get spawned:
2025-02-25 13:42:20.974 2 INFO nova.compute.manager [req-2e3bcb43-36cf-445d-a28d-e15328f0bddb - - - - -] [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] VM Started (Lifecycle Event)
2025-02-25 13:42:20.993 2 DEBUG nova.compute.manager [req-2e3bcb43-36cf-445d-a28d-e15328f0bddb - - - - -] [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] Checking state _get_power_state /usr/lib/python3.9/site-packages/nova/compute/manager.py:1654
2025-02-25 13:42:20.996 2 DEBUG nova.virt.driver [req-2e3bcb43-36cf-445d-a28d-e15328f0bddb - - - - -] Emitting event <LifecycleEvent: 1740490940.9736013, e93fffbf-57cb-437b-ad35-2d286e6687d1 => Paused> emit_event /usr/lib/python3.9/site-packages/nova/virt/driver.py:1612
2025-02-25 13:42:20.996 2 INFO nova.compute.manager [req-2e3bcb43-36cf-445d-a28d-e15328f0bddb - - - - -] [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] VM Paused (Lifecycle Event)
Until it gets deleted 5 minutes later when the creation fails after 300 seconds:
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [req-ad87de7f-0e0d-4361-a187-dc4ac370b41a 68a9841ac07d4f388a897d6aa8fb6ecc 748a8eb9d8c946cfa99599dd71951ebb - default default] [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] Instance failed to spawn: nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1]
2025-02-25 13:47:21.305 2 INFO nova.compute.manager [req-ad87de7f-0e0d-4361-a187-dc4ac370b41a 68a9841ac07d4f388a897d6aa8fb6ecc 748a8eb9d8c946cfa99599dd71951ebb - default default] [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] Terminating instance
2025-02-25 13:47:21.308 2 INFO nova.virt.libvirt.driver [-] [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] Instance destroyed successfully.
From ovs-vswitchd.log we only see this regarding this port:
2025-02-25T13:42:20.340Z|47959|bridge|WARN|could not open network device tapf93216b2-df (No such device)
2025-02-25T13:42:20.542Z|47960|bridge|INFO|bridge br-int: added interface tapf93216b2-df on port 1940
From ovn-metadata-agent.log:
We see the binding of the port to the compute node:
2025-02-25 13:42:20.552 5666 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: PortBindingUpdatedEvent(events=('update',), table='Port_Binding', conditions=None, old_conditions=None) to row=Port_Binding(mac=['fa:16:3e:50:b1:8c 10.151.227.136 2001:1b70:82b9:33b::263'], port_security=['fa:16:3e:50:b1:8c 10.151.227.136 2001:1b70:82b9:33b::263'], type=, nat_addresses=[], virtual_parent=[], up=[False], options=
, parent_port=[], requested_additional_chassis=[], ha_chassis_group=[], external_ids=
{'neutron:cidrs': '10.151.227.136/25 2001:1b70:82b9:33b::263/64', 'neutron:device_id': 'e93fffbf-57cb-437b-ad35-2d286e6687d1', 'neutron:device_owner': 'compute:nova', 'neutron:network_name': 'neutron-843ddda0-f77a-49f5-be44-f6770bcbcc9b', 'neutron:port_capabilities': '', 'neutron:port_name': '', 'neutron:project_id': '748a8eb9d8c946cfa99599dd71951ebb', 'neutron:revision_number': '2', 'neutron:security_group_ids': '3cc02867-c299-4491-b5f4-135ab6c347db', 'neutron:subnet_pool_addr_scope4': '', 'neutron:subnet_pool_addr_scope6': '', 'neutron:vnic_type': 'normal'}, additional_chassis=[], tag=[], additional_encap=[], encap=[], mirror_rules=[], datapath=99bb1488-a347-4855-a779-d42d405e3534, chassis=[<ovs.db.idl.Row object at 0x7f961f4f5b80>], tunnel_key=921, gateway_chassis=[], requested_chassis=[<ovs.db.idl.Row object at 0x7f961f4f5b80>], logical_port=f93216b2-dfd0-41f4-8cb7-4359727bbfb7) old=Port_Binding(chassis=[]) matches /usr/lib/python3.9/site-packages/ovsdbapp/backend/ovs_idl/event.py:43
2025-02-25 13:42:20.553 5666 INFO neutron.agent.ovn.metadata.agent [-] Port f93216b2-dfd0-41f4-8cb7-4359727bbfb7 in datapath 843ddda0-f77a-49f5-be44-f6770bcbcc9b bound to our chassis
After it failed the binding is removed:
2025-02-25 13:47:21.033 5666 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: PortBindingUpdatedEvent(events=('update',), table='Port_Binding', conditions=None, old_conditions=None) to row=Port_Binding(mac=['fa:16:3e:50:b1:8c 10.151.227.136 2001:1b70:82b9:33b::263'], port_security=['fa:16:3e:50:b1:8c 10.151.227.136 2001:1b70:82b9:33b::263'], type=, nat_addresses=[], virtual_parent=[], up=[False], options=
, parent_port=[], requested_additional_chassis=[], ha_chassis_group=[], external_ids=
{'neutron:cidrs': '10.151.227.136/25 2001:1b70:82b9:33b::263/64', 'neutron:device_id': 'e93fffbf-57cb-437b-ad35-2d286e6687d1', 'neutron:device_owner': 'compute:nova', 'neutron:network_name': 'neutron-843ddda0-f77a-49f5-be44-f6770bcbcc9b', 'neutron:port_capabilities': '', 'neutron:port_name': '', 'neutron:project_id': '748a8eb9d8c946cfa99599dd71951ebb', 'neutron:revision_number': '3', 'neutron:security_group_ids': '3cc02867-c299-4491-b5f4-135ab6c347db', 'neutron:subnet_pool_addr_scope4': '', 'neutron:subnet_pool_addr_scope6': '', 'neutron:vnic_type': 'normal'}, additional_chassis=[], tag=[], additional_encap=[], encap=[], mirror_rules=[], datapath=99bb1488-a347-4855-a779-d42d405e3534, chassis=[], tunnel_key=921, gateway_chassis=[], requested_chassis=[<ovs.db.idl.Row object at 0x7f961f4f5b80>], logical_port=f93216b2-dfd0-41f4-8cb7-4359727bbfb7) old=Port_Binding(up=[True], chassis=[<ovs.db.idl.Row object at 0x7f961f4f5b80>]) matches /usr/lib/python3.9/site-packages/ovsdbapp/backend/ovs_idl/event.py:43
2025-02-25 13:47:21.034 5666 INFO neutron.agent.ovn.metadata.agent [-] Port f93216b2-dfd0-41f4-8cb7-4359727bbfb7 in datapath 843ddda0-f77a-49f5-be44-f6770bcbcc9b unbound from our chassis
We have sosreport from the compute nodes with all logs in debug mode.
- is duplicated by
-
OSPRH-12891 BZ#2214289 Neutron port is not set to up if worker is disconnected while LSP is created and set to up in Northbound
-
- Closed
-
- links to
- mentioned on