Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-121177

qemu-kvm uses more memory than expected after inflating balloon

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • rhel-9.6
    • qemu-kvm
    • None
    • None
    • Low
    • rhel-virt-core
    • None
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Unspecified
    • Unspecified
    • Unspecified
    • x86_64
    • None

      What were you trying to do that didn't work?

      A RHEL 9.6 VM running in a RHEL 9.6 host is configured with 100 GiB of memory and 48 GiB currentMemory, so the balloon is set to reclaim 52 GiB:

       

        <memory unit='KiB'>104857600</memory>
        <currentMemory unit='KiB'>50331648</currentMemory>

       

      However, the RSS usage is unexpectedly high, more than 77 GiB:

       

      # virsh dommemstat VM_NAME
      actual 50331648
      swap_in 44196
      swap_out 108796
      major_fault 13076
      minor_fault 822827789
      unused 5894164
      available 48054352
      usable 28802428
      last_update 1755634271
      disk_caches 21084584
      hugetlb_pgalloc 0
      hugetlb_pgfail 0
      rss 81404532    <---- 

       

      The guest is seeing the reduced memory and has plenty of free memory:

       

      # free -m
                    total        used        free      shared  buff/cache   available
      Mem:          46928       17780        5569         200       23577       27947
      Swap:          4095           1        4094
      
      # cat /proc/meminfo
      MemTotal:       48054352 kB
      MemFree:         5699984 kB
      MemAvailable:   28615440 kB
      Buffers:         2114020 kB
      Cached:         18977604 kB
      SwapCached:            0 kB
      Active:         19021676 kB
      Inactive:       19671040 kB
      Active(anon):      34548 kB
      Inactive(anon): 17772100 kB
      Active(file):   18987128 kB
      Inactive(file):  1898940 kB
      Unevictable:           0 kB
      Mlocked:               0 kB
      SwapTotal:       4194300 kB
      SwapFree:        4192744 kB
      Dirty:               552 kB
      Writeback:             0 kB
      AnonPages:      17433224 kB
      Mapped:           576212 kB
      Shmem:            205592 kB
      KReclaimable:    3052372 kB
      Slab:            3408724 kB
      SReclaimable:    3052372 kB
      SUnreclaim:       356352 kB
      KernelStack:       17952 kB
      PageTables:        74044 kB
      NFS_Unstable:          0 kB
      Bounce:                0 kB
      WritebackTmp:          0 kB
      CommitLimit:    28221476 kB
      Committed_AS:   24463328 kB
      VmallocTotal:   34359738367 kB
      VmallocUsed:      141664 kB
      VmallocChunk:          0 kB
      Percpu:             5088 kB
      HardwareCorrupted:     0 kB
      AnonHugePages:  13197312 kB
      ShmemHugePages:        0 kB
      ShmemPmdMapped:        0 kB
      FileHugePages:         0 kB
      FilePmdMapped:         0 kB
      HugePages_Total:       0
      HugePages_Free:        0
      HugePages_Rsvd:        0
      HugePages_Surp:        0
      Hugepagesize:       2048 kB
      Hugetlb:               0 kB
      DirectMap4k:      696172 kB
      DirectMap2M:    62218240 kB
      DirectMap1G:    44040192 kB

       

      What is the impact of this issue to you?

      It's affecting a production hypervisor.

      Please provide the package NVR for which the bug is seen:

      kernel-5.14.0-570.24.1.el9_6.x86_64
      libvirt-daemon-10.10.0-7.3.el9_6.x86_64
      qemu-kvm-core-9.1.0-15.el9_6.4.x86_64

      How reproducible is this bug?:

      Easily reproducible in customer's environment, but I couldn't reproduce it locally.

      Steps to reproduce

      1. RHEL 9.6 host and VM.
      2. Boot VM with 100 GB of memory.
      3. `virsh setmem VM_NAME 48g --live`
      4. Monitor memory usage with `virsh dommemstat VM_NAME`

      Expected results

      RSS memory usage of the qemu-kvm process is reduced to 48g + some reasonable overhead

      Actual results

      RSS is 81404532, a ~61% overhead over the configured 48g of memory.

      Additional information:

      We have tried to use valgrind to identify a possible memory leak with this wrapper script:

      exec /usr/bin/valgrind --error-limit=no --trace-children=yes --track-origins=yes --leak-check=full --show-leak-kinds=definite --log-file=/tmp/valgrind_qemu-%p.log /usr/libexec/qemu-kvm "$@"

      However, the guest hangs with a lot of soft lockup messages during the early boot. 

      Full qemu command line:

      LC_ALL=C \
      PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin \
      HOME=/var/lib/libvirt/qemu/domain-4-VM-NAME \
      XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-4-VM-NAME/.local/share \
      XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-4-VM-NAME/.cache \
      XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-4-VM-NAME/.config \
      /usr/libexec/qemu-kvm \
      -name guest=VM-NAME,debug-threads=on \
      -S \
      -object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-4-VM-NAME/master-key.aes"}' \
      -machine pc-q35-rhel9.0.0,usb=off,dump-guest-core=off,memory-backend=pc.ram,hpet=off,acpi=on \
      -accel kvm \
      -cpu Skylake-Server-noTSX-IBRS,spec-ctrl=on,stibp=on,ssbd=on,pdpe1gb=on,md-clear=on,mds-no=off,taa-no=off,mpx=off,hypervisor=on,pku=on \
      -m size=104857600k \
      -object '{"qom-type":"memory-backend-ram","id":"pc.ram","size":107374182400}' \
      -overcommit mem-lock=off \
      -smp 8,sockets=8,cores=1,threads=1 \
      -uuid fcf4959a-23bb-4999-a7b6-cb4231ac32f4 \
      -no-user-config \
      -nodefaults \
      -chardev socket,id=charmonitor,fd=35,server=on,wait=off \
      -mon chardev=charmonitor,id=monitor,mode=control \
      -rtc base=utc,driftfix=slew \
      -global kvm-pit.lost_tick_policy=delay \
      -no-shutdown \
      -global ICH9-LPC.disable_s3=1 \
      -global ICH9-LPC.disable_s4=1 \
      -boot strict=on \
      -device '{"driver":"pcie-root-port","port":8,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x1"}' \
      -device '{"driver":"pcie-root-port","port":9,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x1.0x1"}' \
      -device '{"driver":"pcie-root-port","port":10,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x1.0x2"}' \
      -device '{"driver":"qemu-xhci","id":"usb","bus":"pci.2","addr":"0x0"}' \
      -device '{"driver":"virtio-scsi-pci","id":"scsi0","bus":"pci.1","addr":"0x0"}' \
      -blockdev '{"driver":"host_device","filename":"/dev/mapper/REDACTED","node-name":"libvirt-2-storage","read-only":false,"discard":"unmap","cache":{"direct":true,"no-flush":false}}' \
      -device '{"driver":"scsi-hd","bus":"scsi0.0","channel":0,"scsi-id":0,"lun":0,"device_id":"drive-scsi0-0-0-0","drive":"libvirt-2-storage","id":"scsi0-0-0-0","bootindex":1,"write-cache":"on"}' \
      -device '{"driver":"ide-cd","bus":"ide.0","id":"sata0-0-0"}' \
      -netdev '{"type":"tap","fd":"37","vhost":true,"vhostfd":"39","id":"hostnet0"}' \
      -device '{"driver":"virtio-net-pci","netdev":"hostnet0","id":"net0","mac":"52:54:00:00:00:01","bus":"pcie.0","addr":"0x3","romfile":""}' \
      -chardev pty,id=charserial0 \
      -device '{"driver":"isa-serial","chardev":"charserial0","id":"serial0","index":0}' \
      -device '{"driver":"usb-tablet","id":"input0","bus":"usb.0","port":"1"}' \
      -audiodev '{"id":"audio1","driver":"none"}' \
      -vnc '[::1]:127,audiodev=audio1' \
      -device '{"driver":"cirrus-vga","id":"video0","bus":"pcie.0","addr":"0x2"}' \
      -global ICH9-LPC.noreboot=off \
      -watchdog-action reset \
      -device '{"driver":"virtio-balloon-pci","id":"balloon0","bus":"pcie.0","addr":"0x6"}' \
      -object '{"qom-type":"rng-random","id":"objrng0","filename":"/dev/random"}' \
      -device '{"driver":"virtio-rng-pci","rng":"objrng0","id":"rng0","max-bytes":1024,"period":1000,"bus":"pcie.0","addr":"0x5"}' \
      -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
      -msg timestamp=on

        1. smaps
          828 kB
        2. vmstat
          4 kB

              virt-maint virt-maint
              rhn-support-jortialc Juan Orti
              virt-maint virt-maint
              virt-bugs virt-bugs
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: