• Icon: Story Story
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • rhel-9.4
    • rhel-9.4
    • virtiofsd
    • None
    • sst_virtualization_storage
    • ssg_virtualization
    • False
    • Hide

      None

      Show
      None
    • Red Hat Virtualization
    • x86_64, aarch64

      We want to support dynamically using multiple memslots to expose virtio-mem device memory to the VM; using dynamically multiple memslots can drastically reduce memory overhead in the hypervisor (especially, KVM), when a device exposes only comparatively small memory towards the VM, compared to its possible maximum size.

      For QEMU, the feature is enabled using "dynamic-memslots=on". With "dynamic-memslots=off", the feature is disabled and we default to using a single large memslot statically.

      In combination with vhost devices, this new feature can be problematic if the devices support less than 509 memslots. If such devices are created before the virtio-mem device in QEMU, virtio-mem will default to the old handling of using a single memslot only. If the devices are created after the virtio-mem devices (on the cmdline, hotplug of such devices), QEMU will bail out.

      For vhost support in the kernel, we should always have 509 memslots configured in RHEL using

      $ cat /etc/modprobe.d/vhost.conf 
      # Increase default vhost memory map limit to match
      # KVM's memory slot limit
      options vhost max_mem_regions=509
      

      virtiofsd based on rust-vmm's implementation of vhost-user devices supports significantly less memslots (32) and needs support for more memslots.

      In particular, the VHOST_USER_GET_MAX_MEM_SLOTS (36) command must indicate support for >= 509 memslots.

      Initial support has been merged into QEMU and will be part of QEMU 8.2.0. One bugfix is still pending.

      Note: in case of virtio-mem, many memslots (max 256) can target the same fd, just different regions inside that fd. To reduce the #VMAs in the process, one could think about using a single large mmap() that cover the whole file, and letting the individual memslots simply work on that. The downside of that approach is that the whole file would be mmap'ed an accessible, even though only parts of it are actually currently mapped into the VM. Using mprotect() does not really make sense, because it would similarly create many #VMAs. uffd might be usable to catch illegal access to regions not covered by a memslot, but that needs more thought.

            gmaglion German Maglione
            dhildenb@redhat.com David Hildenbrand
            German Maglione German Maglione
            Tingting Mao Tingting Mao
            Votes:
            0 Vote for this issue
            Watchers:
            18 Start watching this issue

              Created:
              Updated: