-
Bug
-
Resolution: Unresolved
-
Major
-
CNV v4.20.3
-
None
-
Quality / Stability / Reliability
-
0.42
-
False
-
-
False
-
None
-
-
Moderate
-
None
Description of problem:
While troubleshooting a latency issue, I've noticed we are potentially delaying the delivery of a network packet to a Guest, because the Guest is tuned with the highest possible performance options and the emulator threads are pinned, all to the same pCPU
Version-Release number of selected component (if applicable):
4.20.3
How reproducible:
Always
Steps to Reproduce:
1. Setup the VM
cpu:
cores: 1
dedicatedCpuPlacement: true
isolateEmulatorThread: true
numa:
guestMappingPassthrough: {}
realtime: {}
sockets: 2
threads: 1
2. Start it
3. Check vhost is pinned to the same CPU as virt-launcher and qemu-kvm
107 102374 102364 2 00:23 ? 00:00:00 /usr/bin/virt-launcher --qemu-timeout 346s --name test .....
107 102477 102364 75 00:23 ? 00:00:00 /usr/libexec/qemu-kvm -name guest=homelab_test, .....
root 102482 2 0 00:23 ? 00:00:00 [vhost-102477]
$ taskset -cp 102374
pid 102374's current affinity list: 6
$ taskset -cp 102477
pid 102477's current affinity list: 6
$ taskset -cp 102482
pid 102482's current affinity list: 6
Even with realtime enabled, there is nothing set on vhost:
$ chrt -p 102374
pid 102374's current scheduling policy: SCHED_OTHER
pid 102374's current scheduling priority: 0
$ chrt -p 102482
pid 102482's current scheduling policy: SCHED_OTHER
pid 102482's current scheduling priority: 0
Actual results:
virt-launcher, qemu-kvm and vhost can compete with the same CPU.
Expected results:
One CPU for each? Or at least preemption/priority?
Additional info:
PIDs will differ from above, but look at the virt-launcher hoarding the CPU and causing a delay for vhost-net to run virt-launcher-2221771 [012] d..2. 262238.193703: sched_stat_runtime: comm=virt-launcher pid=2221771 runtime=10840 [ns] virt-launcher-2221771 [012] d..2. 262238.193704: sched_stat_wait: comm=virt-launcher pid=2221377 delay=37875 [ns] virt-launcher-2221771 [012] d..2. 262238.193705: sched_switch: virt-launcher:2221771 [120] S ==> virt-launcher:2221377 [120] virt-launcher-2221377 [012] d..2. 262238.193709: sched_stat_runtime: comm=virt-launcher pid=2221377 runtime=6451 [ns] virt-launcher-2221377 [012] d..2. 262238.193711: sched_stat_wait: comm=vhost-2221997 pid=2222169 delay=10073994 [ns] vhost-2221997-2222169 [012] d..1. 262238.193753: softirq_raise: vec=3 [action=NET_RX] That's a 10073994ns = 10.7ms delay for vhost to run, sitting on runqueue, potentially delaying networking operations of the guest (it did a NET_RX as soon as it ran).