Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-36141

MicroShift: Pods writing files larger than memory limit to PVCs tend to OOM frequently

XMLWordPrintable

    • Yes
    • 3
    • uShift Sprint 258
    • 1
    • False
    • Hide

      None

      Show
      None

      This tracks disabling of MG LRU by writing 0 to `/sys/kernel/mm/lru_gen/enabled`

      Description of problem:

      Since 4.16.0 pods with memory limits tend to OOM very frequently when writing files larger than memory limit to PVC

      Version-Release number of selected component (if applicable):

      4.16.0-rc.4

      How reproducible:

      100% on certain types of storage
      (AWS FSx, certain LVMS setups, see additional info)

      Steps to Reproduce:

      1. Create pod/pvc that writes a file larger than the container memory limit (attached example)
      2.
      3.
      

      Actual results:

      OOMKilled

      Expected results:

      Success

      Additional info:

      For simplicity, I will focus on BM setup that produces this with LVM storage.
      This is also reproducible on AWS clusters with NFS backed NetApp ONTAP FSx.
      
      Further reduced to exclude the OpenShift layer, LVM on a separate (non root) disk:
      
      Prepare disk
      lvcreate -T vg1/thin-pool-1 -V 10G -n oom-lv
      mkfs.ext4 /dev/vg1/oom-lv 
      mkdir /mnt/oom-lv
      mount /dev/vg1/oom-lv /mnt/oom-lv
      
      Run container
      podman run -m 600m --mount type=bind,source=/mnt/oom-lv,target=/disk --rm -it quay.io/centos/centos:stream9 bash
      [root@2ebe895371d2 /]# curl https://cloud.centos.org/centos/9-stream/x86_64/images/CentOS-Stream-GenericCloud-x86_64-9-20240527.0.x86_64.qcow2 -o /disk/temp
        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
       47 1157M   47  550M    0     0   111M      0  0:00:10  0:00:04  0:00:06  111MKilled
      (Notice the process gets killed, I don't think podman ever whacks the whole container over this though)
      
      The same process on the same hardware on a 4.15 node (9.2) does not produce an OOM
      (vs 4.16 which is RHEL 9.4)
      
      For completeness, I will provide some details about the setup behind the LVM pool, though I believe it should not impact the decision about whether this is an issue:
      sh-5.1# pvdisplay 
        --- Physical volume ---
        PV Name               /dev/sdb
        VG Name               vg1
        PV Size               446.62 GiB / not usable 4.00 MiB
        Allocatable           yes 
        PE Size               4.00 MiB
        Total PE              114335
        Free PE               11434
        Allocated PE          102901
        PV UUID               <UUID>
      Hardware:
      SSD (INTEL SSDSC2KG480G8R) behind a RAID 0 of a PERC H330 Mini controller
      
      At the very least, this seems like a change in behavior but tbh I am leaning towards an outright bug.

      QE Verification Steps

      It's been independently verified that setting /sys/kernel/mm/lru_gen/enabled = 0 avoids the oomkills. So verifying that nodes get this value applied is the main testing concern at this point, new installs, upgrades, and new nodes scaled after an upgrade.

      If we want to go so far as to verify that the oomkills don't happen the kernel QE team have a simplified reproducer here which involves mounting an NFS volume and using podman to create a container with a memory limit and writing data to that NFS volume.

      https://issues.redhat.com/browse/RHEL-43371?focusedId=24981771&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-24981771

       

            pacevedo@redhat.com Pablo Acevedo Montserrat
            dfroehli42rh Daniel Fröhlich
            John George John George
            Shauna Diaz Shauna Diaz
            Votes:
            0 Vote for this issue
            Watchers:
            7 Start watching this issue

              Created:
              Updated:
              Resolved: