What were you trying to do that didn't work?
When using the managed="yes" attribute for an SR-IOV Virtual Function (VF) in a libvirt VM's XML configuration,
VM teardown is causing a complete and unrecoverable loss of the host's network connectivity.
Sometimes fails earlier - no clue why.
What is the impact of this issue to you?
After the VM is destroyed, the host machine's network connection is permanently lost. cannot be returned to beaker.
It is not possible to ssh there from outside network
It is possible to get there via ssh .. but only from machine in same subdomain e.g. 10.6.8.* - but not able to get outside.
Many jenkins jobs failing with error (channel connection lost) - see e.g:
https://libvirt-rhos-jenkins-product.hosted.upshift.rdu2.redhat.com/job/libvirt-RHEL-9.7-runtest-aarch64-function-viommu/44/console
impacting both aarch64 & x86_64 tests (also *sriov and some virtual_network)
Please provide the package NVR for which the bug is seen:
libvirt | libvirt-10.10.0-14.el9.aarch64 |
qemu-kvm | qemu-kvm-9.1.0-25.el9.aarch64 |
kernel | kernel-5.14.0-604.el9.aarch64+64k |
How reproducible is this bug?: 90%
(for some reason sometimes it will not fail ..
and also the machine below is working almost without issue ...
ampere-mtsnow-altramax-63.lab.eng.rdu2.redhat.com
I was able to do those operations many times on aampere*63 it manually (also trigger tests manually etc) but when I've tried to run it from jenkins there the test it failed again:
https://libvirt-rhos-jenkins-product.hosted.upshift.rdu2.redhat.com/job/libvirt-RHEL-9.7-runtest-aarch64-function-viommu/49/
Steps to Reproduce
- Enable SR-IOV on a supported network card (e.g., Intel I350) and create one or more VFs on the host.
echo 4 > /sys/devices/pci0002:00/0002:00:01.0/0002:01:00.0/sriov_numvfs
- set vfio-pci driver by:
sudo driverctl set-override 0002:01:10.0 vfio-pci
- Create a VM with an interface configured for PCI passthrough using type="hostdev" and managed="yes" on a VF. (see avo.xml attached)
- Start the VM using virsh start <vm_name>.
- On the host, check the driver in use for the VF with lspci -k.
- Shut down the VM using virsh destroy <vm_name>.
- return the driver:
sudo driverctl unset-override 0002:01:10.0
Observed Behavior
- After the VM is destroyed, and the driver unset - the host machine's network connection is permanently lost.
- ... sometimes it happen earlier ... e.g. just after destroying the guest.
Expected Behavior
- The host machine's network connection should remain stable and functional after the VM is destroyed.
Bug description created with help of Gemini.
example of VM is attached.
Additional info:
There are no errors in dmesg, journalctl,