Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-14377

[17.1.4] OVN Virtual Interface creation failed (100+ instance lunched)

XMLWordPrintable

    • 8
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • openstack-neutron-18.6.1-17.1.20250325141037.85ff760.el9osttrunk
    • None
    • Important

      We have a client trying to create a bunch of instances (100+) and most of them work just fine but sometimes they fail spawning on random compute nodes.
      The instances are on openstack and I can't see anything wrong in nova (libvirt) and neutron (ovn) that would explain it.

      In the case below other instances were created seconds before with success.

      But the issue is intermittent, because for the past 2-3 days they keep creating/deleting instances on the platform and no more issues are observed.

      The goal here is to understand why sometimes it failed.

      The error:
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [req-ad87de7f-0e0d-4361-a187-dc4ac370b41a 68a9841ac07d4f388a897d6aa8fb6ecc 748a8eb9d8c946cfa99599dd71951ebb - default default] [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] Instanc
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] Traceback (most recent call last):
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 7376, in _create_guest_with_network
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] guest = self._create_guest(
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] File "/usr/lib64/python3.9/contextlib.py", line 126, in _exit_
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] next(self.gen)
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 481, in wait_for_instance_event
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] actual_event = event.wait()
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] File "/usr/lib/python3.9/site-packages/eventlet/event.py", line 125, in wait
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] result = hub.switch()
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] File "/usr/lib/python3.9/site-packages/eventlet/hubs/hub.py", line 313, in switch
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] return self.greenlet.switch()
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] eventlet.timeout.Timeout: 300 seconds
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1]
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] During handling of the above exception, another exception occurred:
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1]
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] Traceback (most recent call last):
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 2754, in _build_resources
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] yield resources
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 2512, in _build_and_run_instance
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] self.driver.spawn(context, instance, image_meta,
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 4284, in spawn
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] self._create_guest_with_network(
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 7402, in _create_guest_with_network
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] raise exception.VirtualInterfaceCreateException()
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1]

      The libvirt part:
      From /var/log/containers/libvirt/virtqemud.log.2.gz
      We see the port being created:
      2025-02-25 13:42:20.541+0000: 6505: info : virNetDevTapCreate:258 : created device: 'tapf93216b2-df'
      2025-02-25 13:42:20.541+0000: 6505: debug : virNetDevSetMACInternal:282 : SIOCSIFHWADDR tapf93216b2-df MAC=fe:16:3e:50:b1:8c - Success
      2025-02-25 13:42:20.541+0000: 6505: debug : virCommandRunAsync:2634 : About to run LC_ALL=C tc qdisc add dev tapf93216b2-df root handle 0: noqueue

      But starting here it look stuck in a loop:
      2025-02-25 13:42:25.527+0000: 6505: debug : virNodeDeviceLookupByName:193 : conn=0x7f8fd4013390, name=net_tapf93216b2_df_fe_16_3e_50_b1_8c
      2025-02-25 13:43:26.487+0000: 6510: debug : virNodeDeviceLookupByName:193 : conn=0x7f8fd4013390, name=net_tapf93216b2_df_fe_16_3e_50_b1_8c
      2025-02-25 13:44:28.514+0000: 6505: debug : virNodeDeviceLookupByName:193 : conn=0x7f8fd4013390, name=net_tapf93216b2_df_fe_16_3e_50_b1_8c
      2025-02-25 13:45:30.482+0000: 6508: debug : virNodeDeviceLookupByName:193 : conn=0x7f8fd4013390, name=net_tapf93216b2_df_fe_16_3e_50_b1_8c
      2025-02-25 13:46:32.474+0000: 6506: debug : virNodeDeviceLookupByName:193 : conn=0x7f8fd4013390, name=net_tapf93216b2_df_fe_16_3e_50_b1_8c
      2025-02-25 13:47:21.185+0000: 6511: info : virNetDevTapDelete:311 : delete device: 'tapf93216b2-df'

      RCA for why one instance couldn't be spawned: e93fffbf-57cb-437b-ad35-2d286e6687d1

      From nova-compute.log:
      Here the port gets created successfully:
      2025-02-25 13:42:18.928 2 DEBUG nova.compute.manager [req-ad87de7f-0e0d-4361-a187-dc4ac370b41a 68a9841ac07d4f388a897d6aa8fb6ecc 748a8eb9d8c946cfa99599dd71951ebb - default default] [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] Start building networks asynchronously for instance. _build_resources /usr/lib/python3.9/site-packages/nova/compute/manager.py:2678
      2025-02-25 13:42:18.966 2 DEBUG nova.compute.manager [req-ad87de7f-0e0d-4361-a187-dc4ac370b41a 68a9841ac07d4f388a897d6aa8fb6ecc 748a8eb9d8c946cfa99599dd71951ebb - default default] [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] Allocating IP information in the background. _allocate_network_async /usr/lib/python3.9/site-packages/nova/compute/manager.py:1844
      2025-02-25 13:42:19.970 2 DEBUG nova.network.neutron [req-ad87de7f-0e0d-4361-a187-dc4ac370b41a 68a9841ac07d4f388a897d6aa8fb6ecc 748a8eb9d8c946cfa99599dd71951ebb - default default] [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] Successfully updated port: f93216b2-dfd0-41f4-8cb7-4359727bbfb7 _update_port /usr/lib/python3.9/site-packages/nova/network/neutron.py:587
      The instance is creating a port f93216b2-dfd0-41f4-8cb7-4359727bbfb7:
      2025-02-25 13:42:19.496 2 DEBUG nova.network.neutron [req-ad87de7f-0e0d-4361-a187-dc4ac370b41a 68a9841ac07d4f388a897d6aa8fb6ecc 748a8eb9d8c946cfa99599dd71951ebb - default default] [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] Successfully created port: f93216b2-dfd0-41f4-8cb7-4359727bbfb7 _create_port_minimal /usr/lib/python3.9/site-packages/nova/network/neutron.py:549
      2025-02-25 13:42:20.317 2 DEBUG nova.network.os_vif_util [req-ad87de7f-0e0d-4361-a187-dc4ac370b41a 68a9841ac07d4f388a897d6aa8fb6ecc 748a8eb9d8c946cfa99599dd71951ebb - default default] Converted object VIFOpenVSwitch(active=False,address=fa:16:3e:50:b1:8c,bridge_name='br-int',has_traffic_filtering=True,id=f93216b2-dfd0-41f4-8cb7-4359727bbfb7,network=Network(843ddda0-f77a-49f5-be44-f6770bcbcc9b),plugin='ovs',port_profile=VIFPortProfileOpenVSwitch,preserve_on_delete=False,vif_name='tapf93216b2-df') nova_to_osvif_vif /usr/lib/python3.9/site-packages/nova/network/os_vif_util.py:560

      2025-02-25 13:42:20.491 2 DEBUG nova.virt.libvirt.driver [req-ad87de7f-0e0d-4361-a187-dc4ac370b41a 68a9841ac07d4f388a897d6aa8fb6ecc 748a8eb9d8c946cfa99599dd71951ebb - default default] No VIF found with MAC fa:16:3e:50:b1:8c, not building metadata _build_interface_metadata /usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py:11629

      The instance get spawned:
      2025-02-25 13:42:20.974 2 INFO nova.compute.manager [req-2e3bcb43-36cf-445d-a28d-e15328f0bddb - - - - -] [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] VM Started (Lifecycle Event)
      2025-02-25 13:42:20.993 2 DEBUG nova.compute.manager [req-2e3bcb43-36cf-445d-a28d-e15328f0bddb - - - - -] [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] Checking state _get_power_state /usr/lib/python3.9/site-packages/nova/compute/manager.py:1654
      2025-02-25 13:42:20.996 2 DEBUG nova.virt.driver [req-2e3bcb43-36cf-445d-a28d-e15328f0bddb - - - - -] Emitting event <LifecycleEvent: 1740490940.9736013, e93fffbf-57cb-437b-ad35-2d286e6687d1 => Paused> emit_event /usr/lib/python3.9/site-packages/nova/virt/driver.py:1612
      2025-02-25 13:42:20.996 2 INFO nova.compute.manager [req-2e3bcb43-36cf-445d-a28d-e15328f0bddb - - - - -] [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] VM Paused (Lifecycle Event)

      Until it gets deleted 5 minutes later when the creation fails after 300 seconds:
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [req-ad87de7f-0e0d-4361-a187-dc4ac370b41a 68a9841ac07d4f388a897d6aa8fb6ecc 748a8eb9d8c946cfa99599dd71951ebb - default default] [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] Instance failed to spawn: nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed

      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
      2025-02-25 13:47:21.303 2 ERROR nova.compute.manager [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1]
      2025-02-25 13:47:21.305 2 INFO nova.compute.manager [req-ad87de7f-0e0d-4361-a187-dc4ac370b41a 68a9841ac07d4f388a897d6aa8fb6ecc 748a8eb9d8c946cfa99599dd71951ebb - default default] [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] Terminating instance
      2025-02-25 13:47:21.308 2 INFO nova.virt.libvirt.driver [-] [instance: e93fffbf-57cb-437b-ad35-2d286e6687d1] Instance destroyed successfully.

      From ovs-vswitchd.log we only see this regarding this port:
      2025-02-25T13:42:20.340Z|47959|bridge|WARN|could not open network device tapf93216b2-df (No such device)
      2025-02-25T13:42:20.542Z|47960|bridge|INFO|bridge br-int: added interface tapf93216b2-df on port 1940

      From ovn-metadata-agent.log:
      We see the binding of the port to the compute node:
      2025-02-25 13:42:20.552 5666 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: PortBindingUpdatedEvent(events=('update',), table='Port_Binding', conditions=None, old_conditions=None) to row=Port_Binding(mac=['fa:16:3e:50:b1:8c 10.151.227.136 2001:1b70:82b9:33b::263'], port_security=['fa:16:3e:50:b1:8c 10.151.227.136 2001:1b70:82b9:33b::263'], type=, nat_addresses=[], virtual_parent=[], up=[False], options=

      {'requested-chassis': 'cloud20c-compute-std-18.localdomain'}

      , parent_port=[], requested_additional_chassis=[], ha_chassis_group=[], external_ids=

      {'neutron:cidrs': '10.151.227.136/25 2001:1b70:82b9:33b::263/64', 'neutron:device_id': 'e93fffbf-57cb-437b-ad35-2d286e6687d1', 'neutron:device_owner': 'compute:nova', 'neutron:network_name': 'neutron-843ddda0-f77a-49f5-be44-f6770bcbcc9b', 'neutron:port_capabilities': '', 'neutron:port_name': '', 'neutron:project_id': '748a8eb9d8c946cfa99599dd71951ebb', 'neutron:revision_number': '2', 'neutron:security_group_ids': '3cc02867-c299-4491-b5f4-135ab6c347db', 'neutron:subnet_pool_addr_scope4': '', 'neutron:subnet_pool_addr_scope6': '', 'neutron:vnic_type': 'normal'}

      , additional_chassis=[], tag=[], additional_encap=[], encap=[], mirror_rules=[], datapath=99bb1488-a347-4855-a779-d42d405e3534, chassis=[<ovs.db.idl.Row object at 0x7f961f4f5b80>], tunnel_key=921, gateway_chassis=[], requested_chassis=[<ovs.db.idl.Row object at 0x7f961f4f5b80>], logical_port=f93216b2-dfd0-41f4-8cb7-4359727bbfb7) old=Port_Binding(chassis=[]) matches /usr/lib/python3.9/site-packages/ovsdbapp/backend/ovs_idl/event.py:43

      2025-02-25 13:42:20.553 5666 INFO neutron.agent.ovn.metadata.agent [-] Port f93216b2-dfd0-41f4-8cb7-4359727bbfb7 in datapath 843ddda0-f77a-49f5-be44-f6770bcbcc9b bound to our chassis

      After it failed the binding is removed:
      2025-02-25 13:47:21.033 5666 DEBUG ovsdbapp.backend.ovs_idl.event [-] Matched UPDATE: PortBindingUpdatedEvent(events=('update',), table='Port_Binding', conditions=None, old_conditions=None) to row=Port_Binding(mac=['fa:16:3e:50:b1:8c 10.151.227.136 2001:1b70:82b9:33b::263'], port_security=['fa:16:3e:50:b1:8c 10.151.227.136 2001:1b70:82b9:33b::263'], type=, nat_addresses=[], virtual_parent=[], up=[False], options=

      {'requested-chassis': 'cloud20c-compute-std-18.localdomain'}

      , parent_port=[], requested_additional_chassis=[], ha_chassis_group=[], external_ids=

      {'neutron:cidrs': '10.151.227.136/25 2001:1b70:82b9:33b::263/64', 'neutron:device_id': 'e93fffbf-57cb-437b-ad35-2d286e6687d1', 'neutron:device_owner': 'compute:nova', 'neutron:network_name': 'neutron-843ddda0-f77a-49f5-be44-f6770bcbcc9b', 'neutron:port_capabilities': '', 'neutron:port_name': '', 'neutron:project_id': '748a8eb9d8c946cfa99599dd71951ebb', 'neutron:revision_number': '3', 'neutron:security_group_ids': '3cc02867-c299-4491-b5f4-135ab6c347db', 'neutron:subnet_pool_addr_scope4': '', 'neutron:subnet_pool_addr_scope6': '', 'neutron:vnic_type': 'normal'}

      , additional_chassis=[], tag=[], additional_encap=[], encap=[], mirror_rules=[], datapath=99bb1488-a347-4855-a779-d42d405e3534, chassis=[], tunnel_key=921, gateway_chassis=[], requested_chassis=[<ovs.db.idl.Row object at 0x7f961f4f5b80>], logical_port=f93216b2-dfd0-41f4-8cb7-4359727bbfb7) old=Port_Binding(up=[True], chassis=[<ovs.db.idl.Row object at 0x7f961f4f5b80>]) matches /usr/lib/python3.9/site-packages/ovsdbapp/backend/ovs_idl/event.py:43

      2025-02-25 13:47:21.034 5666 INFO neutron.agent.ovn.metadata.agent [-] Port f93216b2-dfd0-41f4-8cb7-4359727bbfb7 in datapath 843ddda0-f77a-49f5-be44-f6770bcbcc9b unbound from our chassis

      We have sosreport from the compute nodes with all logs in debug mode.

              rodolfo_alonso Rodolfo Alonso
              rhn-support-riramos Ricardo Ramos
              Gregoire Grimaux
              Bharath M V Bharath M V
              rhos-dfg-networking-squad-neutron
              Votes:
              0 Vote for this issue
              Watchers:
              13 Start watching this issue

                Created:
                Updated:
                Resolved: