The back-end receives the guest memory as multiple shared memory regions, each annotated with (A) the starting physical guest address, and (B) the starting “user address”, i.e. the virtual address in qemu’s (the front-end’s) address space.
Without an IOMMU, the user address is only used for vring addresses. Otherwise, the back-end will generally work with physical guest addresses, finding its virtual address from those guest addresses.
With an IOMMU, physical guest addresses become completely unused. Instead, the back-end only receives I/O virtual addresses (IOVAs), which are translated to user addresses (in-qemu virtual addresses) via the IOMMU.
For this, the back-end is supposed to have an IOTLB. The front-end,
which has the IOMMU, can send updates and invalidations to it; but that
will not suffice to translate everything, so the back-end needs to be
prepared to declare IOTLB misses and request mappings from the
front-end. This is done via the Backend Req mechanism
(VHOST_USER_PROTOCOL_F_BACKEND_REQ
). With
F_REPLY
, such requests can be forced to be synchronous,
allowing synchronous IOTLB look-ups.
Basically everything (except to access the vrings) assumes that there is only two address types we have to deal with:
The latter is assumed to be convertible to valid pointers to the data.
Basically all translation is done through a GuestMemory
implementation (which is a trait): You pass it a physical guest address,
a size, and you get a slice to the data in the back-end virtual address
space.
The vm-virtio crates are built on top of GuestMemory
, so
any IOVA translation layer must implement that interface, or we’d need
to make quite radical changes to those crates.
From a design perspective, there are two major downsides to
implementing GuestMemory
for an IOVA address space:
u64
, even though this is architecture-dependent), but if
possible, it would be preferable to have a strict type separation
between IOVAs, guest addresses, and user addresses. We won’t be able to
get any type separation if we have to implement
GuestMemory
, instead having to use the
GuestAddress
type for IOVAs.GuestMemory
just has the wrong interface. It relies on
being separable into several continuous memory regions, which will no
longer be true with an IOMMU; if GuestAddress
is an IOVA,
then the separate regions will not represent continuous
GuestAddress
ranges (because they represent continuous user
address ranges).The former is a nuisance, the latter is quite bad, and it is unclear yet how much of a problem it represents.
It seems likely that the separability is not relevant to
GuestMemory
users outside of the vm-memory crate. Instead,
it may only used by vm-memory itself to automatically implement the
GuestMemory
trait methods.
If so, either of two courses of actions seem feasible:
GuestMemory
on an IOMMU-aware object, but
do not provide the separating methods, e.g. find_region()
.
Such methods would panic at runtime. We would not use the default
GuestMemory
method implementations, but reimplement them so
they are IOMMU aware. This is quite ugly, but might provide the least
invasive solution.GuestMemory
trait,
instead moving it to a new trait
(e.g. PhysicalGuestMemory
), on which
GuestMemory
is then auto-implemented. This would be a
breaking change in the interface, but statically prove that the
separability is not used outside of the vm-memory crate.I assume the latter is better for immediate development, and should be the first method to pursue. If the breaking change is not palatable upstream, it should be easily transformable into the first.
We need a trait definition for something that can translate IOVAs to user addresses, i.e. an IOMMU.
We need an IOMMU-aware GuestMemory
wrapper such
that:
GuestMemory
(GuestAddress
values fed into it are assumed to be IOVAs),GuestMemory
trait,GuestMemory
object allows access based on
user addresses (i.e. GuestAddress
values fed into it are
assumed to be user addresses). This must be minded when constructing
this inner object, i.e. its regions must be based on the user addresses
provided by the front-end, not the guest addresses.We need an implementation of this IOMMU trait satisfying the vhost-user model:
Note that this IOTLB would thus need to be aware of vhost-user concepts.
The back-end implementation would be tasked with connecting the IOTLB
to the vhost-user back-end req FD as provided by
set_backend_req_fd()
.
We can either implement these things across the existing crates:
GuestMemory
wrapper is provided by
the vm-memory crate,Alternatively, I believe we could also create a new independent crate that contains all of those, so none of the rust-vmm crates would need to be modified.
It is very much desirable to get these changes into the rust-vmm crates, though, so that is what we should pursue first.
The IOTLB implementation will require access to the back-end req
channel (which is a Unix domain socket connection). This could interfere
with concurrent use for other purposes, especially when relying on
synchronous communication (F_REPLY
IOTLB messages).
Luckily, vhost::vhost_user::Backend
has internal
mutability and can just be cloned to allow for multiple concurrent
users, so it shouldn’t pose a problem.
An IOTLB miss will trigger a vhost-user back-end request, i.e.:
This is quite slow. A fully synchronous model, i.e. where we have a synchronous function to read from guest memory, will suffer a lot in performance when encountering IOTLB misses.
An asynchronous model would be much better here, where accessing guest memory is allowed to yield, and another guest request could be processed in the meantime, while awaiting the front-end to answer. However, allowing async memory accesses would likely require a major redesign of a lot of vhost components.