-
Story
-
Resolution: Unresolved
-
Normal
-
None
-
Important
-
1
-
rhel-virt-windows
-
ssg_virtualization
-
8
-
QE ack
-
False
-
False
-
-
Yes
-
Red Hat Enterprise Linux
-
Virtio-win 02/Apr- 15/Apr
-
Pass
-
None
-
Enhancement
-
-
Proposed
-
-
x86_64
-
Windows
-
None
Goal
Evaluate the feasibility of merging the virtio header and data packet into the same memory block to reduce DMA operations, optimize PCIe bandwidth usage, and improve overall network throughput.
Current State
In the current reception logic of netkvm, the virtio protocol headers and data packets are in two separate memory blocks. So, at least two memory blocks are needed for one descriptor. From the host's perspective, the network card (hardware implementation) requires two DMA operations to retrieve a single packet, thus consuming more PCIe bandwidth.
Upstream issue
netkvm: Enhancing Host Throughput by Combining Virtio Header and Data in a Single Memory Block for NetKVM #1078
- Code snippet mentioned: https://github.com/virtio-win/kvm-guest-drivers-windows/blob/master/NetKVM/Common/ParaNdis_RX.cpp#L184
Implementation
The driver prefers to merge together the virtio header and the network packet data on RX path (when such a merge is possible). Additionally there is a new configuration parameter "Init.SeparateRxTail", by default set to "Enable".
If works as following:
Typically the driver needs input buffers of [64K + 30 bytes].
- if "SeparateTail" is enabled, the driver tries to allocate the buffers of 64K and the tail of 30 bytes is allocated separately in smaller memory block
- if "SeparateTail" is disabled, the driver tries to allocate contiguous buffer of [64K + 30 bytes], the last page of 4K is mostly free, but mostly there is no use of it. So for queue of 256 entries the system in fact will have ~1M of valuable hardware memory that can't be efficiently used. But, in exchange, the hardware virtio-net solution will spend less time of RX DMA transactions because the entire RX buffer is contiguous and presented by single virtio-net descriptor.
For QEMU virtio-net device: In general, the change may present some very small improvement on RX CPU consumption and SeparateTail=Disable may present additional very small improvement on receiving of small packets. Need to check.
For hardware: AliCloud engineer reported of 5-10% improvement on receiving UDP packets of 1400 bytes.