What were you trying to do that didn't work?
Passthrough GPU device to the guest with setting acpi nodeset to a numa node with vcpus. VM fails to boot and vm kernel crashes. This is wrong usage, so failure is as expected, but can bring bad user experience.
Please provide the package NVR for which the bug is seen:
libvirt-daemon-common-11.10.0-3.el10_GH_20260113.48238c458e.aarch64
kernel-6.12.0-181.test.el10.aarch64+64k
qemu-kvm-10.1.0-10.el10.augere202601131619.aarch64
Note:
To test GPU passthrough, I have to use above scratch builds.
How reproducible is this bug?:
100%
Steps to reproduce
- Prepare GPU test environment
lspci (on host)
0009:01:00.0 3D controller: NVIDIA Corporation GH100 [GH200 120GB / 480GB] (rev a1)
virsh nodedev-detach pci_0009_01_00_0
modprobe -r nouveau
modprobe nvgrace-gpu-vfio-pci
grep PCI_ID /sys/bus/pci/devices/*/uevent
/sys/bus/pci/devices/0009:01:00.0/uevent:PCI_ID=10DE:2342
echo "10de 2342" | sudo tee -a /sys/bus/pci/drivers/nvgrace_gpu_vfio_pci/new_id
- Configure vm xml with below snippet
<cpu mode='host-passthrough' check='none'>
<numa>
<cell id='0' cpus='0-31' memory='33554432' unit='KiB'/>
<cell id='1' memory='0' unit='KiB'/>
<cell id='2' memory='0' unit='KiB'/>
<cell id='3' memory='0' unit='KiB'/>
<cell id='4' memory='0' unit='KiB'/>
<cell id='5' memory='0' unit='KiB'/>
<cell id='6' memory='0' unit='KiB'/>
<cell id='7' memory='0' unit='KiB'/>
<cell id='8' memory='0' unit='KiB'/>
<cell id='9' memory='16000000' unit='KiB'/>
</numa>
</cpu>
<devices>
<controller type='pci' index='15' model='pcie-expander-bus'>
<model name='pxb-pcie'/>
<target busNr='254'/>
<address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
</controller>
<controller type='pci' index='16' model='pcie-root-port'>
<model name='pcie-root-port'/>
<target chassis='16' port='0x0'/>
<address type='pci' domain='0x0000' bus='0x0f' slot='0x00' function='0x0'/>
</controller>
<hostdev mode='subsystem' type='pci' managed='no'>
<driver iommufd='yes'/>
<source>
<address domain='0x0009' bus='0x01' slot='0x00' function='0x0'/>
</source>
<acpi nodeset='0-9'/>
<address type='pci' domain='0x0000' bus='0x10' slot='0x00' function='0x0'/>
</hostdev>
<iommu model='smmuv3'>
<driver pciBus='15' accel='on' ats='on' ril='off' pasid='on' oas='48'/>
</iommu>
</devices>
# virsh start vm1
Expected results
There should be a validation to detect the incorrect configuration to use a numa node with vcpu for a GPU device. An error should be thrown up instead of continuing booting and crashing.
Actual results
VM continues to boot and kernel crashes