Loading...

Linking RHIVOS CVEs to...

Migration: Automation ...

RHELPRIO AssignedTeam ...

SWIFT: POC Conversion

Sync from "Extern...

XML

Word

Printable

Type: Story
Resolution: Unresolved
Priority: Normal
Fix Version/s: rhel-11.0
Affects Version/s: None
Component/s: virtio-win / virtio-win-prewhql
Labels:
- netkvm

Severity:
Important
sprint_count:
1

AssignedTeam:
rhel-virt-windows
Sub-System Group:

ssg_virtualization

Story Points:
8
ACKs Check:

QE ack
Target Version:

rhel-9.6
Blocked:
False
Ready:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
Yes
Products:

Red Hat Enterprise Linux
Sprint:
Virtio-win 02/Apr- 15/Apr

Preliminary Testing:
Pass
Test Coverage:
None

Release Note Type:
Enhancement
Release Note Text:

Hide
Feature, enhancement:
Reason:
Result:

Show
Feature, enhancement: Reason: Result:
Release Note Status:
Proposed

Experience:
Architecture:

x86_64
OS:
Windows

PX Impact Score:
SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

Goal

Evaluate the feasibility of merging the virtio header and data packet into the same memory block to reduce DMA operations, optimize PCIe bandwidth usage, and improve overall network throughput.

Current State

In the current reception logic of netkvm, the virtio protocol headers and data packets are in two separate memory blocks. So, at least two memory blocks are needed for one descriptor. From the host's perspective, the network card (hardware implementation) requires two DMA operations to retrieve a single packet, thus consuming more PCIe bandwidth.

Upstream issue

netkvm: Enhancing Host Throughput by Combining Virtio Header and Data in a Single Memory Block for NetKVM #1078

Code snippet mentioned: https://github.com/virtio-win/kvm-guest-drivers-windows/blob/master/NetKVM/Common/ParaNdis_RX.cpp#L184

Implementation

The driver prefers to merge together the virtio header and the network packet data on RX path (when such a merge is possible). Additionally there is a new configuration parameter "Init.SeparateRxTail", by default set to "Enable".

If works as following:

Typically the driver needs input buffers of [64K + 30 bytes].

if "SeparateTail" is enabled, the driver tries to allocate the buffers of 64K and the tail of 30 bytes is allocated separately in smaller memory block
if "SeparateTail" is disabled, the driver tries to allocate contiguous buffer of [64K + 30 bytes], the last page of 4K is mostly free, but mostly there is no use of it. So for queue of 256 entries the system in fact will have ~1M of valuable hardware memory that can't be efficiently used. But, in exchange, the hardware virtio-net solution will spend less time of RX DMA transactions because the entire RX buffer is contiguous and presented by single virtio-net descriptor.

For QEMU virtio-net device: In general, the change may present some very small improvement on RX CPU consumption and SeparateTail=Disable may present additional very small improvement on receiving of small packets. Need to check.

For hardware: AliCloud engineer reported of 5-10% improvement on receiving UDP packets of 1400 bytes.

Assignee:: Yuri Benditovich

Reporter:: Wenkang Ji

Developer:: Meirav Dean

QA Contact:: Wenkang Ji

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Created:: 2024/12/19 8:37 AM

Updated:: 2025/11/22 10:18 AM

Stale Date:: 2026/06/04

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates