• Moderate
    • No

      Description of problem:

      Virtual Machines configured with realtime and hugepages can't even stay 1 minute up, killed by oom-killer due to cgroup limits.

      Version-Release number of selected component (if applicable):
      CNV 4.13
      OCP 4.13.1

      How reproducible:
      Always

      Steps to Reproduce:
      1. Setup a VM for low latency work (mainly hugepages and realtime required)

      $ oc get vm fedora-1 -o yaml | yq '.spec.template.spec.domain'
      cpu:
      cores: 1
      dedicatedCpuPlacement: true
      numa:
      guestMappingPassthrough: {}
      realtime: {}
      sockets: 1
      threads: 1
      devices:
      disks:

      • disk:
        bus: virtio
        name: rootdisk
      • disk:
        bus: virtio
        name: cloudinitdisk
        interfaces:
      • macAddress: "02:30:44:00:00:00"
        masquerade: {}
        model: virtio
        name: default
        networkInterfaceMultiqueue: true
        rng: {}
        features:
        acpi: {}
        smm:
        enabled: true
        firmware:
        bootloader:
        efi: {}
        machine:
        type: pc-q35-rhel9.2.0
        memory:
        hugepages:
        pageSize: 1Gi
        resources:
        limits:
        cpu: "1"
        memory: 22Gi
        requests:
        cpu: "1"
        memory: 22Gi

      2. Once the pod starts, because its using hugepages, it fills a limit of around 300M for qemu-kvm

      resources:
      limits:
      cpu: "1"
      devices.kubevirt.io/kvm: "1"
      devices.kubevirt.io/tun: "1"
      devices.kubevirt.io/vhost-net: "1"
      hugepages-1Gi: 22Gi
      memory: "299892737"
      requests:
      cpu: "1"
      devices.kubevirt.io/kvm: "1"
      devices.kubevirt.io/tun: "1"
      devices.kubevirt.io/vhost-net: "1"
      ephemeral-storage: 50M
      hugepages-1Gi: 22Gi
      memory: "299892737"

      3. Which is not enough if realtime: {} is enabled.

      Jun 06 01:10:33 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 46380 (qemu-kvm) total-vm:24321492kB, anon-rss:243796kB, file-rss:21688kB, shmem-rss:4kB, UID:107 pgtables:1300kB oom_score_adj:-997
      Jun 06 01:10:51 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 46841 (qemu-kvm) total-vm:24321492kB, anon-rss:244272kB, file-rss:21524kB, shmem-rss:4kB, UID:107 pgtables:1308kB oom_score_adj:-997
      Jun 06 01:10:51 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 46732 (virt-launcher) total-vm:1395200kB, anon-rss:9840kB, file-rss:39656kB, shmem-rss:0kB, UID:107 pgtables:308kB oom_score_adj:-997
      Jun 06 01:11:42 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 47598 (qemu-kvm) total-vm:24321492kB, anon-rss:244544kB, file-rss:21472kB, shmem-rss:4kB, UID:107 pgtables:1312kB oom_score_adj:-997
      Jun 06 01:11:42 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 47475 (virt-launcher) total-vm:1395132kB, anon-rss:10012kB, file-rss:39908kB, shmem-rss:0kB, UID:107 pgtables:312kB oom_score_adj:-997
      Jun 06 01:12:36 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 48391 (qemu-kvm) total-vm:24321492kB, anon-rss:244052kB, file-rss:21528kB, shmem-rss:4kB, UID:107 pgtables:1300kB oom_score_adj:-997
      Jun 06 01:12:54 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 48867 (qemu-kvm) total-vm:24321492kB, anon-rss:244284kB, file-rss:21604kB, shmem-rss:4kB, UID:107 pgtables:1300kB oom_score_adj:-997
      Jun 06 01:12:54 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 48744 (virt-launcher) total-vm:1395132kB, anon-rss:9992kB, file-rss:39692kB, shmem-rss:0kB, UID:107 pgtables:320kB oom_score_adj:-997
      Jun 06 01:13:45 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 49658 (qemu-kvm) total-vm:24321492kB, anon-rss:244016kB, file-rss:21464kB, shmem-rss:4kB, UID:107 pgtables:1300kB oom_score_adj:-997
      Jun 06 01:13:45 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 49496 (virt-launcher) total-vm:1395204kB, anon-rss:10444kB, file-rss:39616kB, shmem-rss:0kB, UID:107 pgtables:312kB oom_score_adj:-997
      Jun 06 01:14:40 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 50436 (qemu-kvm) total-vm:24305052kB, anon-rss:244312kB, file-rss:21548kB, shmem-rss:4kB, UID:107 pgtables:1304kB oom_score_adj:-997
      Jun 06 01:14:57 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 50879 (qemu-kvm) total-vm:24305052kB, anon-rss:244108kB, file-rss:21536kB, shmem-rss:4kB, UID:107 pgtables:1300kB oom_score_adj:-997
      Jun 06 01:15:15 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 51374 (qemu-kvm) total-vm:24321492kB, anon-rss:244108kB, file-rss:21568kB, shmem-rss:4kB, UID:107 pgtables:1300kB oom_score_adj:-997
      Jun 06 01:15:15 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 51239 (virt-launcher) total-vm:1395204kB, anon-rss:10212kB, file-rss:39648kB, shmem-rss:0kB, UID:107 pgtables:316kB oom_score_adj:-997
      Jun 06 01:16:09 worker-4.toca.local kernel: Memory cgroup out of memory: Killed process 52120 (qemu-kvm) total-vm:24321492kB, anon-rss:244636kB, file-rss:21428kB, shmem-rss:4kB, UID:107 pgtables:1308kB oom_score_adj:-997

      4. The memory situation is as follows:

      Jun 06 01:16:27 worker-4.toca.local kernel: Memory cgroup stats for /kubepods.slice/kubepods-pod204d7fbc_3a3b_4c8f_befe_0810d6f18231.slice:
      Jun 06 01:16:27 worker-4.toca.local kernel: anon 270708736
      file 65536
      kernel 28831744
      kernel_stack 802816
      pagetables 2125824
      percpu 100800
      sock 0
      vmalloc 23465984
      shmem 61440
      zswap 0
      zswapped 0
      file_mapped 16384
      file_dirty 0
      file_writeback 4096
      swapcached 0
      anon_thp 0
      file_thp 0
      shmem_thp 0
      inactive_anon 262545408
      active_anon 8224768
      inactive_file 4096
      active_file 0
      unevictable 0
      slab_reclaimable 787736
      slab_unreclaimable 1200816
      slab 1988552
      workingset_refault_anon 0
      workingset_refault_file 0
      workingset_activate_anon 0
      workingset_activate_file 0
      workingset_restore_anon 0
      workingset_restore_file 0
      workingset_nodereclaim 0
      pgscan 456
      pgsteal 454
      pgscan_kswapd 0
      pgscan_direct 456
      pgsteal_kswapd 0
      pgsteal_direct 454
      pgfault 90828
      pgmajfault 0
      pgrefill 5
      pgactivate 1987
      pgdeactivate 5
      pglazyfree 0
      pglazyfreed 0
      zswpin 0
      zswpout 0
      thp_fault_alloc 0
      thp_collapse_alloc 0
      Jun 06 01:16:27 worker-4.toca.local kernel: Tasks state (memory values in pages):
      Jun 06 01:16:27 worker-4.toca.local kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
      Jun 06 01:16:27 worker-4.toca.local kernel: [ 52443] 0 52443 2076 469 53248 0 -1000 conmon
      Jun 06 01:16:27 worker-4.toca.local kernel: [ 52455] 107 52455 274764 3386 159744 0 -997 virt-launcher-m
      Jun 06 01:16:27 worker-4.toca.local kernel: [ 52482] 107 52482 348800 12472 327680 0 -997 virt-launcher
      Jun 06 01:16:27 worker-4.toca.local kernel: [ 52487] 107 52487 174567 5928 217088 0 -997 virtqemud
      Jun 06 01:16:27 worker-4.toca.local kernel: [ 52488] 107 52488 26133 3930 106496 0 -997 virtlogd
      Jun 06 01:16:27 worker-4.toca.local kernel: [ 52588] 107 52588 6080373 66350 1339392 0 -997 qemu-kvm

      Actual results:

      • VMs won't stay up

      Expected results:

      • VM is up

      Additional info:

      • it calculates the exat same memory limit (non HP one) with realtime or not.
      • realtime appears to need more memory
      • changing the VM limits make no difference, as they end up in the 1G HP limits and qemu still gets the auto-calculated ~300M (depends on VM config)

            [CNV-29431] [2212590] 1 vcpu realtime VM hangs with CNV

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Moderate: OpenShift Virtualization 4.16.0 Images security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2024:4455

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Moderate: OpenShift Virtualization 4.16.0 Images security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4455

            Nini Gu added a comment -

            ralavi@redhat.com Sorry, I missed your last comment for a CNV 4.15 offering. 

            It works for me to close it.

            Nini Gu added a comment - ralavi@redhat.com Sorry, I missed your last comment for a CNV 4.15 offering.  It works for me to close it.

            Ram Lavi added a comment - - edited

            [UPDATED]
            Hey ngu@redhat.com, I ran a VMI on CNV4.15 using your guidelines (2 CPU + isolateEmulatorThread = false + realtime + hugepages), but did not produce the issue.

            Note that I couldn't use 1 CPU because my cluster has SMT enabled by default, and setting 1 CPU would result in `SMT Alignment Error` error.

            full VMI spec here.

            $ oc get vmi realtime-vmi-under-test-7w5h2 NAME                            AGE   PHASE     IP             NODENAME                                         READY realtime-vmi-under-test-7w5h2   87s   Running   10.128.1.124   cnv-qe-infra-28.cnvqe2.lab.eng.rdu2.redhat.com   True
            $ oc get vmi realtime-vmi-under-test-7w5h2 -ojson | jq .status.conditions [
              {
                "lastProbeTime": null,
                "lastTransitionTime": "2024-01-15T09:10:15Z",
                "status": "True",
                "type": "Ready"
              },
              {
                "lastProbeTime": null,
                "lastTransitionTime": null,
                "status": "True",
                "type": "LiveMigratable"
              }
            ]
            
            $ oc get vmi realtime-vmi-under-test-7w5h2 -ojson | jq .spec.domain {
              "cpu": {
                "cores": 2,
                "dedicatedCpuPlacement": true,
                "model": "host-passthrough",
                "numa": {
                  "guestMappingPassthrough": {}
                },
                "realtime": {},
                "sockets": 1,
                "threads": 1
              },
              "devices": {
                "autoattachGraphicsDevice": false,
                "autoattachMemBalloon": false,
                "autoattachSerialConsole": true,
                "disks": [
                  {
                    "disk": {
                      "bus": "virtio"
                    },
                    "name": "rootdisk"
                  },
                  {
                    "disk": {
                      "bus": "virtio"
                    },
                    "name": "cloudinitdisk"
                  }
                ],
                "interfaces": [
                  {
                    "masquerade": {},
                    "name": "default"
                  }
                ]
              },
              "features": {
                "acpi": {
                  "enabled": true
                }
              },
              "firmware": {
                "uuid": "09355779-e3bf-4957-b6e0-91a8ce9fe207"
              },
              "ioThreadsPolicy": "auto",
              "machine": {
                "type": "pc-q35-rhel9.2.0"
              },
              "memory": {
                "guest": "4Gi",
                "hugepages": {
                  "pageSize": "1Gi"
                }
              },
              "resources": {
                "requests": {
                  "memory": "4Gi"
                }
              }
            }
            

            Moreover, according to this isolateEmulatorThread enabled is required for RT VMIs - so I think we can close this ticket on misconfiguration.

             

            Ram Lavi added a comment - - edited [UPDATED] Hey ngu@redhat.com , I ran a VMI on CNV4.15 using your guidelines (2 CPU + isolateEmulatorThread = false + realtime + hugepages), but did not produce the issue. Note that I couldn't use 1 CPU because my cluster has SMT enabled by default, and setting 1 CPU would result in `SMT Alignment Error` error. full VMI spec here . $ oc get vmi realtime-vmi-under-test-7w5h2 NAME                            AGE   PHASE     IP             NODENAME                                         READY realtime-vmi-under-test-7w5h2   87s   Running   10.128.1.124   cnv-qe-infra-28.cnvqe2.lab.eng.rdu2.redhat.com   True $ oc get vmi realtime-vmi-under-test-7w5h2 -ojson | jq .status.conditions [   {     "lastProbeTime" : null ,     "lastTransitionTime" : "2024-01-15T09:10:15Z" ,     "status" : "True" ,     "type" : "Ready"   },   {     "lastProbeTime" : null ,     "lastTransitionTime" : null ,     "status" : "True" ,     "type" : "LiveMigratable"   } ] $ oc get vmi realtime-vmi-under-test-7w5h2 -ojson | jq .spec.domain {   "cpu" : {     "cores" : 2,     "dedicatedCpuPlacement" : true ,     "model" : "host-passthrough" ,     "numa" : {       "guestMappingPassthrough" : {}     },     "realtime" : {},     "sockets" : 1,     "threads" : 1   },   "devices" : {     "autoattachGraphicsDevice" : false ,     "autoattachMemBalloon" : false ,     "autoattachSerialConsole" : true ,     "disks" : [       {         "disk" : {           "bus" : "virtio"         },         "name" : "rootdisk"       },       {         "disk" : {           "bus" : "virtio"         },         "name" : "cloudinitdisk"       }     ],     "interfaces" : [       {         "masquerade" : {},         "name" : " default "       }     ]   },   "features" : {     "acpi" : {       "enabled" : true     }   },   "firmware" : {     "uuid" : "09355779-e3bf-4957-b6e0-91a8ce9fe207"   },   "ioThreadsPolicy" : "auto" ,   "machine" : {     "type" : "pc-q35-rhel9.2.0"   },   "memory" : {     "guest" : "4Gi" ,     "hugepages" : {       "pageSize" : "1Gi"     }   },   "resources" : {     "requests" : {       "memory" : "4Gi"     }   } } Moreover, according to this isolateEmulatorThread enabled is required for RT VMIs - so I think we can close this ticket on misconfiguration.  

            Ram Lavi added a comment - - edited

            ngu@redhat.com Can I offer a CNV 4.15 SNO cluster for you to check it on?

             

            Ram Lavi added a comment - - edited ngu@redhat.com Can I offer a CNV 4.15 SNO cluster for you to check it on?  

            Nini Gu added a comment -

            The bug is reproducible in cnv v4.14.2 when start a real time VM with 'isolateEmulatorThread = false' as pointed out in https://bugzilla.redhat.com/show_bug.cgi?id=2212590#c20. The VM booted up very slowly and then entered into call trace as showed in the attachment 2cpu_cnv4_14_2-01112024

            We don't have a cnv v4.15.* env, so didn't try on it.

            BTW, I am wondering if the fix of https://issues.redhat.com/browse/CNV-36194 would also resolve this issue.

             

             

             

            Nini Gu added a comment - The bug is reproducible in cnv v4.14.2 when start a real time VM with 'isolateEmulatorThread = false' as pointed out in https://bugzilla.redhat.com/show_bug.cgi?id=2212590#c20 . The VM booted up very slowly and then entered into call trace as showed in the attachment 2cpu_cnv4_14_2-01112024 We don't have a cnv v4.15.* env, so didn't try on it. BTW, I am wondering if the fix of https://issues.redhat.com/browse/CNV-36194 would also resolve this issue.      

            Orel Misan added a comment -

            Hi rhn-engineering-mtosatti, is this bug still reproducible in 4.15?

            Orel Misan added a comment - Hi rhn-engineering-mtosatti , is this bug still reproducible in 4.15?

              ralavi@redhat.com Ram Lavi
              rhn-support-gveitmic Germano Veit Michel
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: