Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-58179

Kernel memory reclaim bug

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.16.z
    • RHCOS
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • None
    • None
    • Ready to Pick
    • 1
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      We noticed on an openshift environment that processes getting near to their memory limit would suddenly start using most of the CPU available on the host in kernel code paths. This would make the host where the process was running unresponsive and cause an outage. When we first investigated this we found this was the stack trace on CPU most of the time. This is described in https://clickhouse.com/blog/a-case-of-the-vanishing-cpu-a-linux-kernel-debugging-story and https://issuetracker.google.com/issues/363324206. This is hit in both cgroupv1 and v2 :
      
          native_queued_spin_lock_slowpath
          _raw_spin_lock
          __remove_mapping
          shrink_folio_list
          shrink_inactive_list
          shrink_lruvec
          shrink_node_memcgs
          shrink_node
          shrink_zones.constprop.0
          do_try_to_free_pages
          try_to_free_mem_cgroup_pages
          try_charge_memcg
          charge_memcg
          __mem_cgroup_charge
          __filemap_add_folio
          filemap_add_folio
          page_cache_ra_unbounded
          do_sync_mmap_readahead
          filemap_fault
          __do_fault
          do_read_fault
          do_pte_missing
          __handle_mm_fault
          handle_mm_fault
          do_user_addr_fault
          exc_page_fault
          asm_exc_page_fault
          [Missed User Stack]
      
      Version : OpenShift 4.16+

      How reproducible:

          Always in customer enviornment

      Steps to Reproduce:

      Run the following as root. in cgroupv1 the issue will encounter quickly but cgroupv2 it would take some time to surface. Download it here https://github.com/serxa/stress_memcg/releases/download/v1.0.0/stress_memcg_x86-64
      
      $ mkdir -p /root/files
      $ systemd-run --scope -p MemoryMax=1G ./stress_memcg_x86-64  1000 1000 3000000000 4000000000 /root/files 30000
      
      After a short while this should start using a lot of %sys cpu time and stack traces will show that it's memory reclaim due to mapped file faults. 
      

      Actual results:

          

      Expected results:

          

      Additional info:

          

              Unassigned Unassigned
              rhn-support-nchoudhu Novonil Choudhuri
              None
              None
              Michael Nguyen Michael Nguyen
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated: