Uploaded image for project: 'Project Quay'
  1. Project Quay
  2. PROJQUAY-6414

Clair frequently getting OOMed

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Can't Do
    • Icon: Normal Normal
    • None
    • clair-4.7.2
    • clair
    • False
    • None
    • False
    • Quay Enterprise

      Clair frequently gets OOM killed by the kernel in customer's setup. We see the following in the sosreport:

      Nov 12 18:32:03 NOTE_NAME kernel: clair invoked oom-killer: gfp_mask=0x6000c0(GFP_KERNEL), order=0, oom_score_adj=993
      Nov 12 18:32:03 NOTE_NAME kernel: CPU: 12 PID: 3026716 Comm: clair Not tainted 4.18.0-372.59.1.el8_6.x86_64 #1
      Nov 12 18:32:03 NOTE_NAME kernel: Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
      ...
      Nov 12 18:32:03 NOTE_NAME kernel: memory: usage 16777216kB, limit 16777216kB, failcnt 8422298
      Nov 12 18:32:03 NOTE_NAME kernel: memory+swap: usage 16777216kB, limit 9007199254740988kB, failcnt 0
      Nov 12 18:32:03 NOTE_NAME kernel: kmem: usage 35764kB, limit 9007199254740988kB, failcnt 0
      ...
      Nov 12 18:32:03 NOTE_NAME kernel: Tasks state (memory values in pages):
      Nov 12 18:32:03 NOTE_NAME kernel: [  pid  ]   uid  tgid total_vm      rss pgtables_bytes swapents oom_score_adj name
      Nov 12 18:32:03 NOTE_NAME kernel: [3026621]     0 3026621    35965      599   167936        0         -1000 conmon
      Nov 12 18:32:03 NOTE_NAME kernel: [3026652] 1000700000 3026652  4924955  4197268 34320384        0           993 clair
      Nov 12 18:32:03 NOTE_NAME kernel: [ 276158] 1000700000 276158     3811     1019    81920        0           993 sh
      Nov 12 18:32:03 NOTE_NAME kernel: oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-542d7180d31dbf2ab99ef65b1b787b6ae008bfc6f87f2a2255d0828c1d182852.scope,mems_allowed=0,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pode882c07f_7167_4a07_8c3d_94d19d8c4042.slice,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pode882c07f_7167_4a07_8c3d_94d19d8c4042.slice/crio-542d7180d31dbf2ab99ef65b1b787b6ae008bfc6f87f2a2255d0828c1d182852.scope,task=clair,pid=3026652,uid=1000700000
      Nov 12 18:32:03 NOTE_NAME kernel: Memory cgroup out of memory: Killed process 3026652 (clair) total-vm:19699820kB, anon-rss:16738484kB, file-rss:50528kB, shmem-rss:60kB, UID:1000700000 pgtables:33516kB oom_score_adj:993
      

      The defined limits for a Clair deployment are:

              resources: 
                limits: 
                  cpu: "4"
                  memory: 16Gi
                requests: 
                  cpu: "2"
                  memory: 2Gi
      

      Clair starts with 2 GB and ends up with consuming 16 GB of RAM, client is afraid there may be some kind of a memorry leak that is causing Clair to frequently restart. They are currnetly running 10 Clair pods and I don't see that much activity that would explain such a memory consumption.

      Please check!

              Unassigned Unassigned
              rhn-support-ibazulic Ivan Bazulic
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: