-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.16.z
-
Quality / Stability / Reliability
-
False
-
-
None
-
None
-
None
-
None
-
None
-
None
-
Ready to Pick
-
1
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
We noticed on an openshift environment that processes getting near to their memory limit would suddenly start using most of the CPU available on the host in kernel code paths. This would make the host where the process was running unresponsive and cause an outage. When we first investigated this we found this was the stack trace on CPU most of the time. This is described in https://clickhouse.com/blog/a-case-of-the-vanishing-cpu-a-linux-kernel-debugging-story and https://issuetracker.google.com/issues/363324206. This is hit in both cgroupv1 and v2 : native_queued_spin_lock_slowpath _raw_spin_lock __remove_mapping shrink_folio_list shrink_inactive_list shrink_lruvec shrink_node_memcgs shrink_node shrink_zones.constprop.0 do_try_to_free_pages try_to_free_mem_cgroup_pages try_charge_memcg charge_memcg __mem_cgroup_charge __filemap_add_folio filemap_add_folio page_cache_ra_unbounded do_sync_mmap_readahead filemap_fault __do_fault do_read_fault do_pte_missing __handle_mm_fault handle_mm_fault do_user_addr_fault exc_page_fault asm_exc_page_fault [Missed User Stack] Version : OpenShift 4.16+
How reproducible:
Always in customer enviornment
Steps to Reproduce:
Run the following as root. in cgroupv1 the issue will encounter quickly but cgroupv2 it would take some time to surface. Download it here https://github.com/serxa/stress_memcg/releases/download/v1.0.0/stress_memcg_x86-64 $ mkdir -p /root/files $ systemd-run --scope -p MemoryMax=1G ./stress_memcg_x86-64 1000 1000 3000000000 4000000000 /root/files 30000 After a short while this should start using a lot of %sys cpu time and stack traces will show that it's memory reclaim due to mapped file faults.
Actual results:
Expected results:
Additional info: