Clair frequently gets OOM killed by the kernel in customer's setup. We see the following in the sosreport:
Nov 12 18:32:03 NOTE_NAME kernel: clair invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), order=0, oom_score_adj=993
Nov 12 18:32:03 NOTE_NAME kernel: CPU: 12 PID: 3026716 Comm: clair Not tainted 4.18.0-372.59.1.el8_6.x86_64 #1
Nov 12 18:32:03 NOTE_NAME kernel: Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008 12/07/2018
...
Nov 12 18:32:03 NOTE_NAME kernel: memory: usage 16777216kB, limit 16777216kB, failcnt 8422298
Nov 12 18:32:03 NOTE_NAME kernel: memory+swap: usage 16777216kB, limit 9007199254740988kB, failcnt 0
Nov 12 18:32:03 NOTE_NAME kernel: kmem: usage 35764kB, limit 9007199254740988kB, failcnt 0
...
Nov 12 18:32:03 NOTE_NAME kernel: Tasks state (memory values in pages):
Nov 12 18:32:03 NOTE_NAME kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
Nov 12 18:32:03 NOTE_NAME kernel: [3026621] 0 3026621 35965 599 167936 0 -1000 conmon
Nov 12 18:32:03 NOTE_NAME kernel: [3026652] 1000700000 3026652 4924955 4197268 34320384 0 993 clair
Nov 12 18:32:03 NOTE_NAME kernel: [ 276158] 1000700000 276158 3811 1019 81920 0 993 sh
Nov 12 18:32:03 NOTE_NAME kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-542d7180d31dbf2ab99ef65b1b787b6ae008bfc6f87f2a2255d0828c1d182852.scope,mems_allowed=0,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pode882c07f_7167_4a07_8c3d_94d19d8c4042.slice,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pode882c07f_7167_4a07_8c3d_94d19d8c4042.slice/crio-542d7180d31dbf2ab99ef65b1b787b6ae008bfc6f87f2a2255d0828c1d182852.scope,task=clair,pid=3026652,uid=1000700000
Nov 12 18:32:03 NOTE_NAME kernel: Memory cgroup out of memory: Killed process 3026652 (clair) total-vm:19699820kB, anon-rss:16738484kB, file-rss:50528kB, shmem-rss:60kB, UID:1000700000 pgtables:33516kB oom_score_adj:993
The defined limits for a Clair deployment are:
resources: limits: cpu: "4" memory: 16Gi requests: cpu: "2" memory: 2Gi
Clair starts with 2 GB and ends up with consuming 16 GB of RAM, client is afraid there may be some kind of a memorry leak that is causing Clair to frequently restart. They are currnetly running 10 Clair pods and I don't see that much activity that would explain such a memory consumption.
Please check!