Loading...

XML

Word

Printable

Type: Bug
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: rhos-18.0.10 FR 3
Component/s: openstack-nova
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Docs Approval:
?
AssignedTeam:
rhos-workloads-compute
Regression:
None
Intelligence Requested:
Market:
PX Impact Score:

Severity:
Moderate

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

The Nova compute agent incorrectly identifies a GPU passthrough device as dev_type='type-PF' in the resource tracker's final view, even when the configuration explicitly sets it to type-PCI using [pci]alias.

This occurs because the libvirt driver's hardware detection logic identifies the device as a Physical Function (PF) if it is SR-IOV-capable. This hardware detection takes precedence over the operator's configuration in nova.conf. The presence of the <capability type='virt_functions'> element in the device's XML description from libvirt is what triggers this behavior.

To Reproduce Steps to reproduce the behavior:

Configure a compute node with a GPU that supports SR-IOV (e.g., an NVIDIA L4 or similar).
In nova.conf on the compute node, configure PCI passthrough for this device using device_spec/alias. Explicitly set the device_type to type-PCI.
Restart the nova-compute service.
Observe the nova-compute.log. The logs will show that the configuration is loaded correctly, but the final resource view reported by the resource tracker will show dev_type='type-PF'.

Expected behavior

The Nova compute agent should honor the device_type specified in the [pci]alias configuration, using it to override the hardware-detected device type. If a device is configured as type-PCI, the resource tracker should report it as such, regardless of its underlying SR-IOV capabilities.
Or at least the documentation should clarify this configuration value and autodetection logic behavior, and how the former must match the latter, or when it may be not. Upstream we state "device_type
Type of PCI device. Valid values are: type-PCI, type-PF and type-VF. Note that "device_type": "type-PF" must be specified if you wish to passthrough a device that supports SR-IOV in its entirety."

That documentation section could have explained instead what users are not given a free choice between type-PCI, and type-PF, because that value must be matching to what libvirt and nova compute agent autodectes from HW. Or in downstream docs as well

Screenshots

Attached Image

Device Info (please complete the following information):

Bug impact

The incorrect dev_type prevents the Nova scheduler from correctly matching the device for instances that require a type-PCI device. This is particularly problematic for use cases like full GPU passthrough with non-GRID NVIDIA drivers, which cannot be loaded in the guest if the device is presented in SR-IOV mode.

Known workaround

Disable SR-IOV for affected PCI devices in BIOS. Note that setting kernel args like pci=nosriov won't help with that problem.

Additional context

- Configuration:

[pci]
alias = { "vendor_id":"10de", "product_id":"20f1", "device_type":"type-PCI", "name":"nvidia_a2" }

- Logs:
  The nova-compute.log will show the device being reported incorrectly in the final resource view:

DEBUG nova.compute.resource_tracker [...] Final resource view: ... pci_stats=[PciDevicePool(...,tags=

{...,dev_type='type-PF',...}

)]

Assignee:: Unassigned

Reporter:: Bohdan Dobrelia

Team:: rhos-workloads-compute

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/10/14 8:38 AM

Updated:: 2025/10/14 11:18 AM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty