-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
rhos-18.0.11
-
None
-
False
-
-
False
-
?
-
None
-
-
-
-
Important
To Reproduce Steps to reproduce the behavior:
- Set up PCI in Placement with OTU and with the NVMe cleanup agent
- Inject a fault that prevents the nova's libvirt driver to actually spawn the instance. (E.g. missing iommu kernel parameter on the host)
- Boot a VM that allocates an OTU device
- Observe that the boot failed during the spawn
- Observer that the OTU inventory is de-allocated but remains reserved as expected.
- Observe that the NVMe cleanup is not triggered
- Delete the VM in ERROR
- Observer that the OTU inventory is still reserved and the NVMe cleanup is not triggered
Expected behavior
- When the VM fails to boot on the compute and re-scheduled to another host or put into ERROR if no other host available then NVMe agent is triggered, cleans the device, and unreserves the OTU resource in placement.
Device Info (please complete the following information):
- Devstack with openstack master OR 18.0 latest with install_yamls
- IGB emulated SRIOV PFs in virtual computes
Bug impact
- The NVMe device is not cleaned and remains reserved in Placement. So data leak is not created but the resource is not available for future scheduling
Known workaround
- Restart the NVMe cleanup agent. The restart triggers a re-scan of reserved but unallocated devices and the NVMe agent propely cleans and unreserves them
- --periodic <minutes> flag can be added to the NVMe agent command line making it periodically re-scan the unused but reserved devices.
Additional context
Terminal-1 commands
stack@aio:~$ openstack resource provider inventory list 0bba74f6-a1a8-49dc-a3e4-7944d41b20b5 +----------------------+------------------+----------+------------+----------+-----------+-------+------+ | resource_class | allocation_ratio | min_unit | max_unit | reserved | step_size | total | used | +----------------------+------------------+----------+------------+----------+-----------+-------+------+ | CUSTOM_PCI_8086_10C9 | 1.0 | 1 | 2147483647 | 0 | 1 | 1 | 0 | +----------------------+------------------+----------+------------+----------+-----------+-------+------+ stack@aio:~$ stack@aio:~$ cd /opt/stack/nov -bash: cd: /opt/stack/nov: No such file or directory stack@aio:~$ cd /opt/stack/nova stack@aio:/opt/stack/nova$ vim nova/virt/libvirt/driver.py stack@aio:/opt/stack/nova$ sudo systemctl restart devstack@n-cpu stack@aio:/opt/stack/nova$ stack@aio:/opt/stack/nova$ stack@aio:/opt/stack/nova$ openstack server create --image cirros-0.6.3-x86_64-disk --flavor m1.pf1 --nic none vm1 --wait Error creating server: vm1 stack@aio:/opt/stack/nova$ openstack resource provider inventory list 0bba74f6-a1a8-49dc-a3e4-7944d41b20b5 +----------------------+------------------+----------+----------+----------+-----------+-------+------+ | resource_class | allocation_ratio | min_unit | max_unit | reserved | step_size | total | used | +----------------------+------------------+----------+----------+----------+-----------+-------+------+ | CUSTOM_PCI_8086_10C9 | 1.0 | 1 | 1 | 1 | 1 | 1 | 0 | +----------------------+------------------+----------+----------+----------+-----------+-------+------+ stack@aio:/opt/stack/nova$ openstack server show vm1 +-------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ | Field | Value | +-------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ | OS-DCF:diskConfig | MANUAL | | OS-EXT-AZ:availability_zone | | | OS-EXT-SRV-ATTR:host | None | | OS-EXT-SRV-ATTR:hostname | vm1 | | OS-EXT-SRV-ATTR:hypervisor_hostname | None | | OS-EXT-SRV-ATTR:instance_name | instance-00000003 | | OS-EXT-SRV-ATTR:kernel_id | | | OS-EXT-SRV-ATTR:launch_index | 0 | | OS-EXT-SRV-ATTR:ramdisk_id | | | OS-EXT-SRV-ATTR:reservation_id | r-ahjgp7v9 | | OS-EXT-SRV-ATTR:root_device_name | /dev/vda | | OS-EXT-SRV-ATTR:user_data | None | | OS-EXT-STS:power_state | NOSTATE | | OS-EXT-STS:task_state | None | | OS-EXT-STS:vm_state | error | | OS-SRV-USG:launched_at | None | | OS-SRV-USG:terminated_at | None | | accessIPv4 | | | accessIPv6 | | | addresses | | | config_drive | | | created | 2025-10-06T08:57:39Z | | description | None | | flavor | description=, disk='1', ephemeral='0', extra_specs.pci_passthrough:alias='nic-pf:1', id='m1.pf1', is_disabled=, is_public='True', location=, | | | name='m1.pf1', original_name='m1.pf1', ram='512', rxtx_factor=, swap='0', vcpus='1' | | hostId | | | host_status | | | id | 8aa5ec38-979d-424d-b9ca-dee430b3b9b8 | | image | cirros-0.6.3-x86_64-disk (ff396027-1969-4769-9029-f34fdc493528) | | key_name | None | | locked | False | | locked_reason | None | | name | vm1 | | pinned_availability_zone | None | | progress | None | | project_id | 651770f7e5a140f498fc53d9736c30fa | | properties | | | scheduler_hints | | | server_groups | None | | status | ERROR | | tags | | | trusted_image_certificates | None | | updated | 2025-10-06T08:57:54Z | | user_id | e1849f10596740728659f2a2c3bcbca7 | | volumes_attached | | +-------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------+ stack@aio:/opt/stack/nova$ openstack server delete vm1 stack@aio:/opt/stack/nova$ openstack server liststack@aio:/opt/stack/nova$ openstack resource provider inventory list 0bba74f6-a1a8-49dc-a3e4-7944d41b20b5 +----------------------+------------------+----------+----------+----------+-----------+-------+------+ | resource_class | allocation_ratio | min_unit | max_unit | reserved | step_size | total | used | +----------------------+------------------+----------+----------+----------+-----------+-------+------+ | CUSTOM_PCI_8086_10C9 | 1.0 | 1 | 1 | 1 | 1 | 1 | 0 | +----------------------+------------------+----------+----------+----------+-----------+-------+------+
Terminal-2 logs
// code placeholder stack@aio:~$ cat /tmp/terminal2.log stack@aio:~$ sudo journalctl -u devstack@n-cpu --follow | egrep -e "ERROR|notification.instance" Oct 06 08:57:41 aio nova-compute[104760]: INFO oslo.messaging.notification.instance.create.start [None req-a3913e53-b3a8-473e-8e76-7c9c2200e678 admin admin] {"message_id": "37c0704b-c3bb-467b-92cb-5ddb7382064c", "publisher_id": "nova-compute:aio", "event_type": "instance.create.start", "priority": "INFO", "payload": {"nova_object.name": "InstanceCreatePayload", "nova_object.namespace": "nova", "nova_object.version": "1.13", "nova_object.data": {"keypairs": [], "tags": [], "trusted_image_certificates": null, "instance_name": "instance-00000003", "fault": null, "request_id": "req-a3913e53-b3a8-473e-8e76-7c9c2200e678", "uuid": "8aa5ec38-979d-424d-b9ca-dee430b3b9b8", "user_id": "e1849f10596740728659f2a2c3bcbca7", "tenant_id": "651770f7e5a140f498fc53d9736c30fa", "reservation_id": "r-ahjgp7v9", "display_name": "vm1", "display_description": null, "host_name": "vm1", "host": null, "node": null, "os_type": null, "architecture": null, "availability_zone": "nova", "flavor": {"nova_object.name": "FlavorPayload", "nova_object.namespace": "nova", "nova_object.version": "1.4", "nova_object.data": {"flavorid": "a97a9fb7-b2a1-4032-97c6-9cc749bd7f54", "memory_mb": 512, "vcpus": 1, "root_gb": 1, "ephemeral_gb": 0, "name": "m1.pf1", "swap": 0, "rxtx_factor": 1.0, "vcpu_weight": 0, "disabled": false, "is_public": true, "extra_specs": {"pci_passthrough:alias": "nic-pf:1"}, "projects": null, "description": null}}, "image_uuid": "ff396027-1969-4769-9029-f34fdc493528", "key_name": null, "kernel_id": "", "ramdisk_id": "", "created_at": "2025-10-06T08:57:39Z", "launched_at": null, "terminated_at": null, "deleted_at": null, "updated_at": null, "state": "building", "power_state": "pending", "task_state": null, "progress": 0, "ip_addresses": [], "block_devices": [], "metadata": {}, "locked": false, "auto_disk_config": "MANUAL", "action_initiator_user": "e1849f10596740728659f2a2c3bcbca7", "action_initiator_project": "651770f7e5a140f498fc53d9736c30fa", "locked_reason": null, "shares": null}}, "timestamp": "2025-10-06 08:57:41.432866"} Oct 06 08:57:47 aio nova-compute[104760]: ERROR nova.compute.manager [None req-a3913e53-b3a8-473e-8e76-7c9c2200e678 admin admin] [instance: 8aa5ec38-979d-424d-b9ca-dee430b3b9b8] Instance failed to spawn: ValueError: !!! Oct 06 08:57:47 aio nova-compute[104760]: ERROR nova.compute.manager [instance: 8aa5ec38-979d-424d-b9ca-dee430b3b9b8] Traceback (most recent call last): Oct 06 08:57:47 aio nova-compute[104760]: ERROR nova.compute.manager [instance: 8aa5ec38-979d-424d-b9ca-dee430b3b9b8] File "/opt/stack/nova/nova/compute/manager.py", line 2934, in _build_resources Oct 06 08:57:47 aio nova-compute[104760]: ERROR nova.compute.manager [instance: 8aa5ec38-979d-424d-b9ca-dee430b3b9b8] yield resources Oct 06 08:57:47 aio nova-compute[104760]: ERROR nova.compute.manager [instance: 8aa5ec38-979d-424d-b9ca-dee430b3b9b8] File "/opt/stack/nova/nova/compute/manager.py", line 2681, in _build_and_run_instance Oct 06 08:57:47 aio nova-compute[104760]: ERROR nova.compute.manager [instance: 8aa5ec38-979d-424d-b9ca-dee430b3b9b8] self.driver.spawn(context, instance, image_meta, Oct 06 08:57:47 aio nova-compute[104760]: ERROR nova.compute.manager [instance: 8aa5ec38-979d-424d-b9ca-dee430b3b9b8] File "/opt/stack/nova/nova/virt/libvirt/driver.py", line 4770, in spawn Oct 06 08:57:47 aio nova-compute[104760]: ERROR nova.compute.manager [instance: 8aa5ec38-979d-424d-b9ca-dee430b3b9b8] raise ValueError("!!!") Oct 06 08:57:47 aio nova-compute[104760]: ERROR nova.compute.manager [instance: 8aa5ec38-979d-424d-b9ca-dee430b3b9b8] ValueError: !!! Oct 06 08:57:47 aio nova-compute[104760]: ERROR nova.compute.manager [instance: 8aa5ec38-979d-424d-b9ca-dee430b3b9b8] Oct 06 08:57:51 aio nova-compute[104760]: ERROR oslo.messaging.notification.compute.instance.create.error [None req-a3913e53-b3a8-473e-8e76-7c9c2200e678 admin admin] {"message_id": "22f90434-f5f2-4c14-a66d-b7b39af06627", "publisher_id": "compute.aio", "event_type": "compute.instance.create.error", "priority": "ERROR", "payload": {"tenant_id": "651770f7e5a140f498fc53d9736c30fa", "user_id": "e1849f10596740728659f2a2c3bcbca7", "instance_id": "8aa5ec38-979d-424d-b9ca-dee430b3b9b8", "display_name": "vm1", "reservation_id": "r-ahjgp7v9", "hostname": "vm1", "instance_type": "m1.pf1", "instance_type_id": 14, "instance_flavor_id": "a97a9fb7-b2a1-4032-97c6-9cc749bd7f54", "architecture": null, "memory_mb": 512, "disk_gb": 1, "vcpus": 1, "root_gb": 1, "ephemeral_gb": 0, "host": null, "node": null, "availability_zone": "nova", "cell_name": "", "created_at": "2025-10-06 08:57:39+00:00", "terminated_at": "", "deleted_at": "", "launched_at": "", "image_ref_url": "http://192.168.121.72/image/images/ff396027-1969-4769-9029-f34fdc493528", "os_type": null, "kernel_id": "", "ramdisk_id": "", "state": "building", "state_description": "spawning", "progress": "", "access_ip_v4": null, "access_ip_v6": null, "image_meta": {"hw_rng_model": "virtio", "owner_specified.openstack.md5": "", "owner_specified.openstack.object": "images/cirros-0.6.3-x86_64-disk", "owner_specified.openstack.sha256": "", "min_ram": "0", "min_disk": "1", "disk_format": "qcow2", "container_format": "bare", "base_image_ref": "ff396027-1969-4769-9029-f34fdc493528"}, "metadata": {}, "exception": "ValueError('!!!')", "message": "ValueError", "code": 500}, "timestamp": "2025-10-06 08:57:51.631409"} Oct 06 08:57:51 aio nova-compute[104760]: ERROR oslo.messaging.notification.instance.create.error [None req-a3913e53-b3a8-473e-8e76-7c9c2200e678 admin admin] {"message_id": "83c6dfaa-fe63-47a3-9ce1-3353b23d0a13", "publisher_id": "nova-compute:aio", "event_type": "instance.create.error", "priority": "ERROR", "payload": {"nova_object.name": "InstanceCreatePayload", "nova_object.namespace": "nova", "nova_object.version": "1.13", "nova_object.data": {"keypairs": [], "tags": [], "trusted_image_certificates": null, "instance_name": "instance-00000003", "fault": {"nova_object.name": "ExceptionPayload", "nova_object.namespace": "nova", "nova_object.version": "1.1", "nova_object.data": {"module_name": "nova.virt.libvirt.driver", "function_name": "spawn", "exception": "ValueError", "exception_message": "!!!", "traceback": "Traceback (most recent call last):\n, File \"/opt/stack/nova/nova/compute/manager.py\", line 2681, in _build_and_run_instance\n self.driver.spawn(context, instance, image_meta,\n, File \"/opt/stack/nova/nova/virt/libvirt/driver.py\", line 4770, in spawn\n raise ValueError(\"!!!\")\n,ValueError: !!!\n"}}, "request_id": "req-a3913e53-b3a8-473e-8e76-7c9c2200e678", "uuid": "8aa5ec38-979d-424d-b9ca-dee430b3b9b8", "user_id": "e1849f10596740728659f2a2c3bcbca7", "tenant_id": "651770f7e5a140f498fc53d9736c30fa", "reservation_id": "r-ahjgp7v9", "display_name": "vm1", "display_description": null, "host_name": "vm1", "host": null, "node": null, "os_type": null, "architecture": null, "availability_zone": "nova", "flavor": {"nova_object.name": "FlavorPayload", "nova_object.namespace": "nova", "nova_object.version": "1.4", "nova_object.data": {"flavorid": "a97a9fb7-b2a1-4032-97c6-9cc749bd7f54", "memory_mb": 512, "vcpus": 1, "root_gb": 1, "ephemeral_gb": 0, "name": "m1.pf1", "swap": 0, "rxtx_factor": 1.0, "vcpu_weight": 0, "disabled": false, "is_public": true, "extra_specs": {"pci_passthrough:alias": "nic-pf:1"}, "projects": null, "description": null}}, "image_uuid": "ff396027-1969-4769-9029-f34fdc493528", "key_name": null, "kernel_id": "", "ramdisk_id": "", "created_at": "2025-10-06T08:57:39Z", "launched_at": null, "terminated_at": null, "deleted_at": null, "updated_at": "2025-10-06T08:57:49Z", "state": "building", "power_state": "pending", "task_state": "spawning", "progress": 0, "ip_addresses": [], "block_devices": [], "metadata": {}, "locked": false, "auto_disk_config": "MANUAL", "action_initiator_user": "e1849f10596740728659f2a2c3bcbca7", "action_initiator_project": "651770f7e5a140f498fc53d9736c30fa", "locked_reason": null, "shares": null}}, "timestamp": "2025-10-06 08:57:51.676154"} ^C
Additional reproduction scenario
There is another way to loose an instance.delete trigger.
1. boot a VM successfully on a compute
2. stop the nova-compute service and wait until the nova considers the compute as down
3. delete the VM. Nova will do an API only local delete of the instance. The nova-api will send the instance.delete notification
4. start up the nova-compute service. It will detect that the VM was deleted while nova-compute was down and deletes the VM from the hypervisor. However in this case nova-compute does not send an instance.delete notification as it was already sent from the API. So the cleanup agent has no trigger.