-
Bug
-
Resolution: Unresolved
-
Normal
-
None
-
2.8.6
-
Incidents & Support
-
False
-
-
True
-
-
-
Important
Description of problem:
forklift-controller inventory container is being killed due to OOM. The inventory pod ends up being restarted multiple times per hour. The environment has 51 providers setup over various namespaces. We see the following on the node where the controller is running: Jul 30 21:54:24 example.com kernel: forklift-contro invoked oom-killer: gfp_mask=0xc40(GFP_NOFS), order=0, oom_score_adj=999 Jul 30 21:54:24 example.com kernel: CPU: 13 PID: 3298413 Comm: forklift-contro Tainted: G X ------- --- 5.14.0-427.65.1.el9_4.x86_64 #1 Jul 30 21:54:24 example.com kernel: Hardware name: Lenovo ThinkSystem SR630 V3/SB27A85783, BIOS ESE124B-3.11 01/25/2024 Jul 30 21:54:24 example.com kernel: Call Trace: Jul 30 21:54:24 example.com kernel: <TASK> Jul 30 21:54:24 example.com kernel: dump_stack_lvl+0x34/0x48 Jul 30 21:54:24 example.com kernel: dump_header+0x4a/0x201 Jul 30 21:54:24 example.com kernel: oom_kill_process.cold+0xb/0x10 Jul 30 21:54:24 example.com kernel: out_of_memory+0xed/0x2e0 Jul 30 21:54:24 example.com kernel: mem_cgroup_out_of_memory+0x131/0x150 Jul 30 21:54:24 example.com kernel: R13: 0000000000000000 R14: 000000c027db8380 R15: 3fffffffffffffff Jul 30 21:54:24 example.com kernel: </TASK> Jul 30 21:54:24 example.com kernel: memory: usage 1048576kB, limit 1048576kB, failcnt 2681083 Jul 30 21:54:24 example.com kernel: swap: usage 0kB, limit 0kB, failcnt 0 Jul 30 21:54:24 example.com kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod926e0e00_2006_4192_b325_3dffe93ef3d9.slice/crio-ad12776f20b87f312c4e9e183a0ca91607769d67b7e84be04f6381714c55c830.scope: Jul 30 21:54:24 example.com kernel: anon 1061068800 file 1011712 kernel 11653120 kernel_stack 2785280Jul 30 21:54:24 example.com kernel: Tasks state (memory values in pages): Jul 30 21:54:24 example.com kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name Jul 30 21:54:24 example.com kernel: [3297888] 0 3297888 3633729 256356 4194304 0 999 forklift-contro Jul 30 21:54:24 example.com kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-ad12776f20b87f312c4e9e183a0ca91607769d67b7e84be04f6381714c55c830.scope,mems_allowed=0-1,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod926e0e00_2006_4192_b325_3dffe93ef3d9.slice/crio-ad12776f20b87f312c4e9e183a0ca91607769d67b7e84be04f6381714c55c830.scope,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod926e0e00_2006_4192_b325_3dffe93ef3d9.slice/crio-ad12776f20b87f312c4e9e183a0ca91607769d67b7e84be04f6381714c55c830.scope,task=forklift-contro,pid=3297888,uid=0 Jul 30 21:54:24 example.com kernel: Memory cgroup out of memory: Killed process 3297888 (forklift-contro) total-vm:14534916kB, anon-rss:982416kB, file-rss:43008kB, shmem-rss:0kB, UID:0 pgtables:4096kB oom_score_adj:999 Jul 30 21:54:24 example.com kernel: Tasks in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod926e0e00_2006_4192_b325_3dffe93ef3d9.slice/crio-ad12776f20b87f312c4e9e183a0ca91607769d67b7e84be04f6381714c55c830.scope are going to be killed due to memory.oom.group set Jul 30 21:54:24 example.com kernel: Memory cgroup out of memory: Killed process 3297916 (forklift-contro) total-vm:14534916kB, anon-rss:982416kB, file-rss:43008kB, shmem-rss:0kB, UID:0 pgtables:4096kB oom_score_adj:999 Jul 30 21:54:24 example.com systemd[1]: crio-ad12776f20b87f312c4e9e183a0ca91607769d67b7e84be04f6381714c55c830.scope: A process of this unit has been killed by the OOM killer. Jul 30 21:54:24 example.com systemd[1]: crio-ad12776f20b87f312c4e9e183a0ca91607769d67b7e84be04f6381714c55c830.scope: Deactivated successfully. Jul 30 21:54:24 example.com systemd[1]: crio-ad12776f20b87f312c4e9e183a0ca91607769d67b7e84be04f6381714c55c830.scope: Consumed 46min 32.590s CPU time. Are there known limitations to the number of providers and / or total VMs that the inventory controller can keep track of without increasing the memory limit for the container?
Version-Release number of selected component (if applicable):
MTV 2.8.6