Uploaded image for project: 'Migration Toolkit for Virtualization'
  1. Migration Toolkit for Virtualization
  2. MTV-3080

forklift-controller inventory container OOM with many providers

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Normal Normal
    • None
    • 2.8.6
    • Inventory
    • Incidents & Support
    • False
    • Hide

      None

      Show
      None
    • True
    • Important

      Description of problem:

      forklift-controller inventory container is being killed due to OOM.  The inventory pod ends up being restarted multiple times per hour.  The environment has 51 providers setup over various namespaces.  
      
      We see the following on the node where the controller is running: 
      Jul 30 21:54:24 example.com kernel: forklift-contro invoked oom-killer: gfp_mask=0xc40(GFP_NOFS), order=0, oom_score_adj=999
      Jul 30 21:54:24 example.com kernel: CPU: 13 PID: 3298413 Comm: forklift-contro Tainted: G               X  -------  ---  5.14.0-427.65.1.el9_4.x86_64 #1
      Jul 30 21:54:24 example.com kernel: Hardware name: Lenovo ThinkSystem SR630 V3/SB27A85783, BIOS ESE124B-3.11 01/25/2024
      Jul 30 21:54:24 example.com kernel: Call Trace:
      Jul 30 21:54:24 example.com kernel:  <TASK>
      Jul 30 21:54:24 example.com kernel:  dump_stack_lvl+0x34/0x48
      Jul 30 21:54:24 example.com kernel:  dump_header+0x4a/0x201
      Jul 30 21:54:24 example.com kernel:  oom_kill_process.cold+0xb/0x10
      Jul 30 21:54:24 example.com kernel:  out_of_memory+0xed/0x2e0
      Jul 30 21:54:24 example.com kernel:  mem_cgroup_out_of_memory+0x131/0x150
      Jul 30 21:54:24 example.com kernel: R13: 0000000000000000 R14: 000000c027db8380 R15: 3fffffffffffffff
      Jul 30 21:54:24 example.com kernel:  </TASK>
      Jul 30 21:54:24 example.com kernel: memory: usage 1048576kB, limit 1048576kB, failcnt 2681083
      Jul 30 21:54:24 example.com kernel: swap: usage 0kB, limit 0kB, failcnt 0
      Jul 30 21:54:24 example.com kernel: Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod926e0e00_2006_4192_b325_3dffe93ef3d9.slice/crio-ad12776f20b87f312c4e9e183a0ca91607769d67b7e84be04f6381714c55c830.scope:
      Jul 30 21:54:24 example.com kernel: anon 1061068800
                                                                       file 1011712
                                                                       kernel 11653120
                                                                       kernel_stack 2785280Jul 30 21:54:24 example.com kernel: Tasks state (memory values in pages):
      Jul 30 21:54:24 example.com kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
      Jul 30 21:54:24 example.com kernel: [3297888]     0 3297888  3633729   256356  4194304        0           999 forklift-contro
      Jul 30 21:54:24 example.com kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-ad12776f20b87f312c4e9e183a0ca91607769d67b7e84be04f6381714c55c830.scope,mems_allowed=0-1,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod926e0e00_2006_4192_b325_3dffe93ef3d9.slice/crio-ad12776f20b87f312c4e9e183a0ca91607769d67b7e84be04f6381714c55c830.scope,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod926e0e00_2006_4192_b325_3dffe93ef3d9.slice/crio-ad12776f20b87f312c4e9e183a0ca91607769d67b7e84be04f6381714c55c830.scope,task=forklift-contro,pid=3297888,uid=0
      Jul 30 21:54:24 example.com kernel: Memory cgroup out of memory: Killed process 3297888 (forklift-contro) total-vm:14534916kB, anon-rss:982416kB, file-rss:43008kB, shmem-rss:0kB, UID:0 pgtables:4096kB oom_score_adj:999
      Jul 30 21:54:24 example.com kernel: Tasks in /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod926e0e00_2006_4192_b325_3dffe93ef3d9.slice/crio-ad12776f20b87f312c4e9e183a0ca91607769d67b7e84be04f6381714c55c830.scope are going to be killed due to memory.oom.group set
      Jul 30 21:54:24 example.com kernel: Memory cgroup out of memory: Killed process 3297916 (forklift-contro) total-vm:14534916kB, anon-rss:982416kB, file-rss:43008kB, shmem-rss:0kB, UID:0 pgtables:4096kB oom_score_adj:999
      Jul 30 21:54:24 example.com systemd[1]: crio-ad12776f20b87f312c4e9e183a0ca91607769d67b7e84be04f6381714c55c830.scope: A process of this unit has been killed by the OOM killer.
      Jul 30 21:54:24 example.com systemd[1]: crio-ad12776f20b87f312c4e9e183a0ca91607769d67b7e84be04f6381714c55c830.scope: Deactivated successfully.
      Jul 30 21:54:24 example.com systemd[1]: crio-ad12776f20b87f312c4e9e183a0ca91607769d67b7e84be04f6381714c55c830.scope: Consumed 46min 32.590s CPU time.
      
      
      Are there known limitations to the number of providers and / or total VMs that the inventory controller can keep track of without increasing the memory limit for the container? 

      Version-Release number of selected component (if applicable):

      MTV 2.8.6

              gcheresh@redhat.com Genadi Chereshnya
              shaselde@redhat.com Sean Haselden
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: