Uploaded image for project: 'Red Hat OpenStack Services on OpenShift'
  1. Red Hat OpenStack Services on OpenShift
  2. OSPRH-17439

instanceha fails on compute with a lot of guests (> 70)

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • infra-operator
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • ?
    • None
    • Important

       

      When triggering a kernel crash (sysrq >c) or even naturally when there's a fencing issue, evacuation fails with the following error:

      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [req-328023ee-d50f-40b1-bb1b-0469ba43d018 e10cabe97fc647e8b33057e580b4632c 8c5bc4b8030e4e1eb6c401849059324b - default default] [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8] Setting instance vm_state to ERROR: nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8] Traceback (most recent call last):
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8]   File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 7376, in _create_guest_with_network
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8]     guest = self._create_guest(
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8]   File "/usr/lib64/python3.9/contextlib.py", line 126, in __exit__
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8]     next(self.gen)
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8]   File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 481, in wait_for_instance_event
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8]     actual_event = event.wait()
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8]   File "/usr/lib/python3.9/site-packages/eventlet/event.py", line 125, in wait
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8]     result = hub.switch()
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8]   File "/usr/lib/python3.9/site-packages/eventlet/hubs/hub.py", line 313, in switch
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8]     return self.greenlet.switch()
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8] eventlet.timeout.Timeout: 1200 seconds
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8] 
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8] During handling of the above exception, another exception occurred:
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8] 
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8] Traceback (most recent call last):
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8]   File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 10391, in _error_out_instance_on_exception
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8]     yield
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8]   File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 3548, in rebuild_instance
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8]     self._do_rebuild_instance_with_claim(
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8]   File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 3630, in _do_rebuild_instance_with_claim
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8]     self._do_rebuild_instance(
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8]   File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 3797, in _do_rebuild_instance
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8]     self._rebuild_default_impl(**kwargs)
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8]   File "/usr/lib/python3.9/site-packages/nova/compute/manager.py", line 3434, in _rebuild_default_impl
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8]     self.driver.spawn(context, instance, image_meta, injected_files,
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8]   File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 4284, in spawn
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8]     self._create_guest_with_network(
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8]   File "/usr/lib/python3.9/site-packages/nova/virt/libvirt/driver.py", line 7402, in _create_guest_with_network
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8]     raise exception.VirtualInterfaceCreateException()
      2025-05-29 15:32:02.208 2 ERROR nova.compute.manager [instance: af87a1f9-b740-486f-a44e-10b9277cc7d8] nova.exception.VirtualInterfaceCreateException: Virtual Interface creation failed

      We didn't see any errors in neutron nor OVN beside the nova error above.    When looking at the logs of the fenced and destination computes, at some point we see some logs about instance being migrated away .   Is it possible ovn-controller is getting confused by the original compute host coming back  ?   Customer is not able to reproduce this issue with manual live-migration and evacuation and only happens when instanceha is evacuating the guests from the failed host.

              Unassigned Unassigned
              rhn-support-dhill Dave Hill
              rhos-dfg-pidone
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: