Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-29431

[2212590] 1 vcpu realtime VM hangs with CNV

XMLWordPrintable

    • Medium
    • No

      Description of problem:

      Virtual Machines configured with realtime and hugepages can't even stay 1 minute up, killed by oom-killer due to cgroup limits.

      Version-Release number of selected component (if applicable):
      CNV 4.13
      OCP 4.13.1

      How reproducible:
      Always

      Steps to Reproduce:
      1. Setup a VM for low latency work (mainly hugepages and realtime required)

      $ oc get vm fedora-1 -o yaml | yq '.spec.template.spec.domain'
      cpu:
      cores: 1
      dedicatedCpuPlacement: true
      numa:
      guestMappingPassthrough: {}
      realtime: {}
      sockets: 1
      threads: 1
      devices:
      disks:

      • disk:
        bus: virtio
        name: rootdisk
      • disk:
        bus: virtio
        name: cloudinitdisk
        interfaces:
      • macAddress: "02:30:44:00:00:00"
        masquerade: {}
        model: virtio
        name: default
        networkInterfaceMultiqueue: true
        rng: {}
        features:
        acpi: {}
        smm:
        enabled: true
        firmware:
        bootloader:
        efi: {}
        machine:
        type: pc-q35-rhel9.2.0
        memory:
        hugepages:
        pageSize: 1Gi
        resources:
        limits:
        cpu: "1"
        memory: 22Gi
        requests:
        cpu: "1"
        memory: 22Gi

      2. Once the pod starts, because its using hugepages, it fills a limit of around 300M for qemu-kvm

      resources:
      limits:
      cpu: "1"
      devices.kubevirt.io/kvm: "1"
      devices.kubevirt.io/tun: "1"
      devices.kubevirt.io/vhost-net: "1"
      hugepages-1Gi: 22Gi
      memory: "299892737"
      requests:
      cpu: "1"
      devices.kubevirt.io/kvm: "1"
      devices.kubevirt.io/tun: "1"
      devices.kubevirt.io/vhost-net: "1"
      ephemeral-storage: 50M
      hugepages-1Gi: 22Gi
      memory: "299892737"

      3. Which is not enough if realtime: {} is enabled.

      Jun 06 01:10:33 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 46380 (qemu-kvm) total-vm:24321492kB, anon-rss:243796kB, file-rss:21688kB, shmem-rss:4kB, UID:107 pgtables:1300kB oom_score_adj:-997
      Jun 06 01:10:51 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 46841 (qemu-kvm) total-vm:24321492kB, anon-rss:244272kB, file-rss:21524kB, shmem-rss:4kB, UID:107 pgtables:1308kB oom_score_adj:-997
      Jun 06 01:10:51 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 46732 (virt-launcher) total-vm:1395200kB, anon-rss:9840kB, file-rss:39656kB, shmem-rss:0kB, UID:107 pgtables:308kB oom_score_adj:-997
      Jun 06 01:11:42 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 47598 (qemu-kvm) total-vm:24321492kB, anon-rss:244544kB, file-rss:21472kB, shmem-rss:4kB, UID:107 pgtables:1312kB oom_score_adj:-997
      Jun 06 01:11:42 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 47475 (virt-launcher) total-vm:1395132kB, anon-rss:10012kB, file-rss:39908kB, shmem-rss:0kB, UID:107 pgtables:312kB oom_score_adj:-997
      Jun 06 01:12:36 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 48391 (qemu-kvm) total-vm:24321492kB, anon-rss:244052kB, file-rss:21528kB, shmem-rss:4kB, UID:107 pgtables:1300kB oom_score_adj:-997
      Jun 06 01:12:54 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 48867 (qemu-kvm) total-vm:24321492kB, anon-rss:244284kB, file-rss:21604kB, shmem-rss:4kB, UID:107 pgtables:1300kB oom_score_adj:-997
      Jun 06 01:12:54 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 48744 (virt-launcher) total-vm:1395132kB, anon-rss:9992kB, file-rss:39692kB, shmem-rss:0kB, UID:107 pgtables:320kB oom_score_adj:-997
      Jun 06 01:13:45 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 49658 (qemu-kvm) total-vm:24321492kB, anon-rss:244016kB, file-rss:21464kB, shmem-rss:4kB, UID:107 pgtables:1300kB oom_score_adj:-997
      Jun 06 01:13:45 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 49496 (virt-launcher) total-vm:1395204kB, anon-rss:10444kB, file-rss:39616kB, shmem-rss:0kB, UID:107 pgtables:312kB oom_score_adj:-997
      Jun 06 01:14:40 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 50436 (qemu-kvm) total-vm:24305052kB, anon-rss:244312kB, file-rss:21548kB, shmem-rss:4kB, UID:107 pgtables:1304kB oom_score_adj:-997
      Jun 06 01:14:57 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 50879 (qemu-kvm) total-vm:24305052kB, anon-rss:244108kB, file-rss:21536kB, shmem-rss:4kB, UID:107 pgtables:1300kB oom_score_adj:-997
      Jun 06 01:15:15 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 51374 (qemu-kvm) total-vm:24321492kB, anon-rss:244108kB, file-rss:21568kB, shmem-rss:4kB, UID:107 pgtables:1300kB oom_score_adj:-997
      Jun 06 01:15:15 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 51239 (virt-launcher) total-vm:1395204kB, anon-rss:10212kB, file-rss:39648kB, shmem-rss:0kB, UID:107 pgtables:316kB oom_score_adj:-997
      Jun 06 01:16:09 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 52120 (qemu-kvm) total-vm:24321492kB, anon-rss:244636kB, file-rss:21428kB, shmem-rss:4kB, UID:107 pgtables:1308kB oom_score_adj:-997

      4. The memory situation is as follows:

      Jun 06 01:16:27 worker-4.toca.local kernel: Memory cgroup stats for /kubepods.slice/kubepods-pod204d7fbc_3a3b_4c8f_befe_0810d6f18231.slice:
      Jun 06 01:16:27 worker-4.toca.local kernel: anon 270708736
      file 65536
      kernel 28831744
      kernel_stack 802816
      pagetables 2125824
      percpu 100800
      sock 0
      vmalloc 23465984
      shmem 61440
      zswap 0
      zswapped 0
      file_mapped 16384
      file_dirty 0
      file_writeback 4096
      swapcached 0
      anon_thp 0
      file_thp 0
      shmem_thp 0
      inactive_anon 262545408
      active_anon 8224768
      inactive_file 4096
      active_file 0
      unevictable 0
      slab_reclaimable 787736
      slab_unreclaimable 1200816
      slab 1988552
      workingset_refault_anon 0
      workingset_refault_file 0
      workingset_activate_anon 0
      workingset_activate_file 0
      workingset_restore_anon 0
      workingset_restore_file 0
      workingset_nodereclaim 0
      pgscan 456
      pgsteal 454
      pgscan_kswapd 0
      pgscan_direct 456
      pgsteal_kswapd 0
      pgsteal_direct 454
      pgfault 90828
      pgmajfault 0
      pgrefill 5
      pgactivate 1987
      pgdeactivate 5
      pglazyfree 0
      pglazyfreed 0
      zswpin 0
      zswpout 0
      thp_fault_alloc 0
      thp_collapse_alloc 0
      Jun 06 01:16:27 worker-4.toca.local kernel: Tasks state (memory values in pages):
      Jun 06 01:16:27 worker-4.toca.local kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
      Jun 06 01:16:27 worker-4.toca.local kernel: [ 52443] 0 52443 2076 469 53248 0 -1000 conmon
      Jun 06 01:16:27 worker-4.toca.local kernel: [ 52455] 107 52455 274764 3386 159744 0 -997 virt-launcher-m
      Jun 06 01:16:27 worker-4.toca.local kernel: [ 52482] 107 52482 348800 12472 327680 0 -997 virt-launcher
      Jun 06 01:16:27 worker-4.toca.local kernel: [ 52487] 107 52487 174567 5928 217088 0 -997 virtqemud
      Jun 06 01:16:27 worker-4.toca.local kernel: [ 52488] 107 52488 26133 3930 106496 0 -997 virtlogd
      Jun 06 01:16:27 worker-4.toca.local kernel: [ 52588] 107 52588 6080373 66350 1339392 0 -997 qemu-kvm

      Actual results:

      • VMs won't stay up

      Expected results:

      • VM is up

      Additional info:

      • it calculates the exat same memory limit (non HP one) with realtime or not.
      • realtime appears to need more memory
      • changing the VM limits make no difference, as they end up in the 1G HP limits and qemu still gets the auto-calculated ~300M (depends on VM config)

            ralavi@redhat.com Ram Lavi
            rhn-support-gveitmic Germano Veit Michel
            Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

              Created:
              Updated:
              Resolved: