How vhost-user IOMMU support works

The back-end receives the guest memory as multiple shared memory regions, each annotated with (A) the starting physical guest address, and (B) the starting “user address”, i.e. the virtual address in qemu’s (the front-end’s) address space.

Without an IOMMU, the user address is only used for vring addresses. Otherwise, the back-end will generally work with physical guest addresses, finding its virtual address from those guest addresses.

With an IOMMU, physical guest addresses become completely unused. Instead, the back-end only receives I/O virtual addresses (IOVAs), which are translated to user addresses (in-qemu virtual addresses) via the IOMMU.

For this, the back-end is supposed to have an IOTLB. The front-end, which has the IOMMU, can send updates and invalidations to it; but that will not suffice to translate everything, so the back-end needs to be prepared to declare IOTLB misses and request mappings from the front-end. This is done via the Backend Req mechanism (VHOST_USER_PROTOCOL_F_BACKEND_REQ). With F_REPLY, such requests can be forced to be synchronous, allowing synchronous IOTLB look-ups.

Current state of rust-vmm guest memory handling

Basically everything (except to access the vrings) assumes that there is only two address types we have to deal with:

  1. Physical guest addresses,
  2. Virtual back-end addresses.

The latter is assumed to be convertible to valid pointers to the data.

Basically all translation is done through a GuestMemory implementation (which is a trait): You pass it a physical guest address, a size, and you get a slice to the data in the back-end virtual address space.

What we need with an IOMMU

The vm-virtio crates are built on top of GuestMemory, so any IOVA translation layer must implement that interface, or we’d need to make quite radical changes to those crates.

From a design perspective, there are two major downsides to implementing GuestMemory for an IOVA address space:

The former is a nuisance, the latter is quite bad, and it is unclear yet how much of a problem it represents.

It seems likely that the separability is not relevant to GuestMemory users outside of the vm-memory crate. Instead, it may only used by vm-memory itself to automatically implement the GuestMemory trait methods.

If so, either of two courses of actions seem feasible:

  1. We implement GuestMemory on an IOMMU-aware object, but do not provide the separating methods, e.g. find_region(). Such methods would panic at runtime. We would not use the default GuestMemory method implementations, but reimplement them so they are IOMMU aware. This is quite ugly, but might provide the least invasive solution.
  2. We remove the separability from the GuestMemory trait, instead moving it to a new trait (e.g. PhysicalGuestMemory), on which GuestMemory is then auto-implemented. This would be a breaking change in the interface, but statically prove that the separability is not used outside of the vm-memory crate.

I assume the latter is better for immediate development, and should be the first method to pursue. If the breaking change is not palatable upstream, it should be easily transformable into the first.

New IOTLB code

We need a trait definition for something that can translate IOVAs to user addresses, i.e. an IOMMU.

We need an IOMMU-aware GuestMemory wrapper such that:

We need an implementation of this IOMMU trait satisfying the vhost-user model:

Note that this IOTLB would thus need to be aware of vhost-user concepts.

The back-end implementation would be tasked with connecting the IOTLB to the vhost-user back-end req FD as provided by set_backend_req_fd().

We can either implement these things across the existing crates:

Alternatively, I believe we could also create a new independent crate that contains all of those, so none of the rust-vmm crates would need to be modified.

It is very much desirable to get these changes into the rust-vmm crates, though, so that is what we should pursue first.

Note on Backend Req multiplexing

The IOTLB implementation will require access to the back-end req channel (which is a Unix domain socket connection). This could interfere with concurrent use for other purposes, especially when relying on synchronous communication (F_REPLY IOTLB messages).

Luckily, vhost::vhost_user::Backend has internal mutability and can just be cloned to allow for multiple concurrent users, so it shouldn’t pose a problem.

Notes on IOTLB look-up synchronicity

An IOTLB miss will trigger a vhost-user back-end request, i.e.:

This is quite slow. A fully synchronous model, i.e. where we have a synchronous function to read from guest memory, will suffer a lot in performance when encountering IOTLB misses.

An asynchronous model would be much better here, where accessing guest memory is allowed to yield, and another guest request could be processed in the meantime, while awaiting the front-end to answer. However, allowing async memory accesses would likely require a major redesign of a lot of vhost components.