Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-143349

Missing validation when setting acpi nodeset to a node with vcpus for GPU device

Linking RHIVOS CVEs to...Migration: Automation ...Sync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • None
    • rhel-10.2
    • libvirt
    • None
    • None
    • rhel-virt-core-libvirt-1
    • None
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Unspecified
    • Unspecified
    • Unspecified
    • aarch64
    • None

      What were you trying to do that didn't work?

      Passthrough GPU device to the guest with setting acpi nodeset to a numa node with vcpus. VM fails to boot and vm kernel crashes. This is wrong usage, so failure is as expected, but can bring bad user experience.

      Please provide the package NVR for which the bug is seen:

      libvirt-daemon-common-11.10.0-3.el10_GH_20260113.48238c458e.aarch64
      kernel-6.12.0-181.test.el10.aarch64+64k
      qemu-kvm-10.1.0-10.el10.augere202601131619.aarch64

      Note:
      To test GPU passthrough, I have to use above scratch builds.

      How reproducible is this bug?:

      100%

      Steps to reproduce

      1. Prepare GPU test environment

        lspci (on host)
        0009:01:00.0 3D controller: NVIDIA Corporation GH100 [GH200 120GB / 480GB] (rev a1)
        virsh nodedev-detach pci_0009_01_00_0
        modprobe -r nouveau
        modprobe nvgrace-gpu-vfio-pci
        grep PCI_ID /sys/bus/pci/devices/*/uevent
        /sys/bus/pci/devices/0009:01:00.0/uevent:PCI_ID=10DE:2342
        echo "10de 2342" | sudo tee -a /sys/bus/pci/drivers/nvgrace_gpu_vfio_pci/new_id

      1. Configure vm xml with below snippet
          <cpu mode='host-passthrough' check='none'>
            <numa>
              <cell id='0' cpus='0-31' memory='33554432' unit='KiB'/>
              <cell id='1' memory='0' unit='KiB'/>
              <cell id='2' memory='0' unit='KiB'/>
              <cell id='3' memory='0' unit='KiB'/>
              <cell id='4' memory='0' unit='KiB'/>
              <cell id='5' memory='0' unit='KiB'/>
              <cell id='6' memory='0' unit='KiB'/>
              <cell id='7' memory='0' unit='KiB'/>
              <cell id='8' memory='0' unit='KiB'/>
              <cell id='9' memory='16000000' unit='KiB'/>
            </numa>
          </cpu>
        <devices>
        <controller type='pci' index='15' model='pcie-expander-bus'>
        <model name='pxb-pcie'/>
        <target busNr='254'/>
        <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
        </controller>
        <controller type='pci' index='16' model='pcie-root-port'>
        <model name='pcie-root-port'/>
        <target chassis='16' port='0x0'/>
        <address type='pci' domain='0x0000' bus='0x0f' slot='0x00' function='0x0'/>
        </controller>
        <hostdev mode='subsystem' type='pci' managed='no'>
        <driver iommufd='yes'/>
        <source>
        <address domain='0x0009' bus='0x01' slot='0x00' function='0x0'/>
        </source>
        <acpi nodeset='0-9'/>
        <address type='pci' domain='0x0000' bus='0x10' slot='0x00' function='0x0'/>
        </hostdev>
        <iommu model='smmuv3'>
        <driver pciBus='15' accel='on' ats='on' ril='off' pasid='on' oas='48'/>
        </iommu>
        </devices>

       # virsh start vm1

      1.  
      2.  

      Expected results

      There should be a validation to detect the incorrect configuration to use a numa node with vcpu for a GPU device. An error should be thrown up instead of continuing booting and crashing.

      Actual results

      VM continues to boot and kernel crashes

      Other info

              eauger Eric Auger
              rhn-support-dzheng Dan Zheng
              virt-maint virt-maint
              virt-bugs virt-bugs
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated: