Loading...

XML

Word

Printable

Type: Task
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: 2026.1 (G)
Component/s: openstack-nova
Labels:
None

Story Points:
3
Epic Link:
[Gazpacho]Nova Eventlet removal
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Color Status:
Not Selected
AssignedTeam:
rhos-workloads-compute-quasar
Intelligence Requested:
Market:

Sprint:
Sprint 2 QUasar, Sprint 3 Quasar
sprint_count:
2

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

During integration testing with the vmware virt driver we found a possible deadlock situation in nova-compute.

Quoting from upstream review
https://review.opendev.org/c/openstack/nova/+/965467/44#message-62f7ef828c38c33f322f72737b4016ace9b9f242

I found a bug while testing with 10 VMs per compute in parallel (and probably reproduced the oslo.vmware problem).

The following scenario leads to a high level deadlock:

switch to threading mode, the default executor pool size is 10

boot 10 VMs in parallel

compute gets 10 RPC request for build_and_run_instance

compute moves those request to the default executor due to the logic [1] this makes the default pool full.

build_and_run_instance tasks are progressing and spawning _allocate_network_async [2] to the same default executor and a bit later waiting for them to finish. But the executor is full due to the parent tasks. So we have a deadlock between the 10 parallel build_and_run_instance and the 10 parallel _allocate_network_async tasks.

[1] https://github.com/openstack/nova/blob/59a7093915298973c72b6d1749a6acd27e0045a9/nova/compute/manager.py#L2452-L2460

[2] https://github.com/openstack/nova/blob/59a7093915298973c72b6d1749a6acd27e0045a9/nova/network/model.py#L580

Relevant IRC discussion: https://meetings.opendev.org/irclogs/%23openstack-nova/%23openstack-nova.2026-01-30.log.html#openstack-nova.2026-01-30.log.html#t2026-01-30T15:02:52

I will do the following:

audit all the spawn calls on the compute side to see how many similar cases we have

move the build_and_run_instance tasks to a dedicated executor that will implement the limit for parallel builds that today is implemented by a semaphore

look into solutions that can catch the case when a task running in an executor tries to submit a new task to the same executor.

is depended on by

OSPRH-19525 Make nova-compute run in native threading mode

Backlog

Assignee:: Balazs Gibizer

Reporter:: Balazs Gibizer

Contributors:: Ghanshyam Maan, Sylvain Bauza

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2026/02/03 9:43 AM

Updated:: 2026/02/10 9:54 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty