Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-18295

whereabouts-reconciler pods restarting due to OOMKilled

XMLWordPrintable

    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • 9/27: bumped up to p2; KNIECO-8167

      Description of problem:

      whereabouts-reconciler pods restart occasionally with Reason OOMKilled:
      Containers:
        whereabouts:
          Container ID: cri-o://2ac8ca768989a2e5c941debd20788222e4085be89099169d865672e84f80716c
          Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:78af3f50156b9d8086be236356c881f14d8736fe4efbb73d961919c049565225
          Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:78af3f50156b9d8086be236356c881f14d8736fe4efbb73d961919c049565225
          Port: <none>
          Host Port: <none>
          Command:
            /bin/sh
          Args:
            -c
            /usr/src/whereabouts/bin/ip-control-loop -log-level debug
      
          State: Running
            Started: Sun, 06 Aug 2023 06:30:53 +0200
          Last State: Terminated
            Reason: OOMKilled
            Exit Code: 137
            Started: Fri, 14 Jul 2023 06:31:49 +0200
            Finished: Sun, 06 Aug 2023 06:30:51 +0200
          Ready: True
          Restart Count: 2

      Version-Release number of selected component (if applicable):

       

      How reproducible:

      Not reproducible as of now.
      

      Actual results:

      The reconciler is restarting due to an OOM.

      Expected results:

      The reconciler should complete its task without restarting due to an OOM.

      Additional info:
      Logs seen before the OOM:

      2023-08-06T22:26:48.116507835Z 2023-08-06T22:26:48Z [verbose] the NAD's config: {{"cniVersion": "0.3.1", "name": "oam-macvlan", "type": "macvlan", "master": "vlan162", "mtu": 1500, "ipam": {"type": "whereabouts", "range": "2001:1b74:480:6109::10-2001:1b74:480:6109::ff/64", "gateway": "2001:1b74:480:6109::1"}}}
      2023-08-06T22:26:48.116566269Z 2023-08-06T22:26:48Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
      2023-08-06T22:26:48.116575416Z 2023-08-06T22:26:48Z [verbose] result of garbage collecting pods: failed to get the IPPool data: ippool.whereabouts.cni.cncf.io "2001-1b74-480-6109---64" not found
      2023-08-06T22:26:48.116581151Z 2023-08-06T22:26:48Z [verbose] re-queuing IP address reconciliation request for pod vran5g-ocptc-testpodsnetwork/sekiicvh00300-pod; retry #: 2
      2023-08-06T22:26:48.136767743Z 2023-08-06T22:26:48Z [verbose] skipped net-attach-def for default network
      2023-08-06T22:26:48.136767743Z 2023-08-06T22:26:48Z [debug] pod's network status: {Name:default/oam-macvlan Interface:port1 IPs:[2001:1b74:480:6109::18] Mac:22:6d:06:15:7d:76 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil>}
      2023-08-06T22:26:48.136774755Z 2023-08-06T22:26:48Z [verbose] the NAD's config: {{"cniVersion": "0.3.1", "name": "oam-macvlan", "type": "macvlan", "master": "vlan162", "mtu": 1500, "ipam": {"type": "whereabouts", "range": "2001:1b74:480:6109::10-2001:1b74:480:6109::ff/64", "gateway": "2001:1b74:480:6109::1"}}}
      2023-08-06T22:26:48.136822543Z 2023-08-06T22:26:48Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf
      2023-08-06T22:26:48.136831782Z 2023-08-06T22:26:48Z [verbose] result of garbage collecting pods: failed to get the IPPool data: ippool.whereabouts.cni.cncf.io "2001-1b74-480-6109---64" not found
      2023-08-06T22:26:48.136836213Z 2023-08-06T22:26:48Z [error] dropping pod [vran5g-ocptc-testpodsnetwork/sekiicvh00300-pod] deletion out of the queue - could not reconcile IP: failed to get the IPPool data: ippool.whereabouts.cni.cncf.io "2001-1b74-480-6109---64" not found
      2023-08-06T22:26:48.137287757Z 2023-08-06T22:26:48Z [verbose] Event(v1.ObjectReference{Kind:"Pod", Namespace:"vran5g-ocptc-testpodsnetwork", Name:"sekiicvh00300-pod", UID:"08644f94-6537-41de-803a-07a10e6e2aaf", APIVersion:"v1", ResourceVersion:"32801811", FieldPath:""}): type: 'Warning' reason: 'IPAddressGarbageCollectionFailed' failed to garbage collect addresses for pod vran5g-ocptc-testpodsnetwork/sekiicvh00300-pod

      dmesg logs:

      [2098682.389649] ip-control-loop invoked oom-killer: gfp_mask=0x600040(GFP_NOFS), order=0, oom_score_adj=999
      [2098682.389654] CPU: 2 PID: 2618506 Comm: ip-control-loop Kdump: loaded Not tainted 4.18.0-372.59.1.el8_6.x86_64 #1
      [2098682.389657] Hardware name: Dell Inc. PowerEdge R650/0PYXKY, BIOS 1.9.2 11/17/2022
      [2098682.389658] Call Trace:
      [2098682.389662] dump_stack+0x41/0x60
      [2098682.389667] dump_header+0x4a/0x1df
      [2098682.389673] oom_kill_process.cold.32+0xb/0x10
      [2098682.389676] out_of_memory+0x1bd/0x4e0
      [2098682.389679] mem_cgroup_out_of_memory+0xec/0x100
      [2098682.389683] try_charge+0x64f/0x690
      [2098682.389686] ? common_interrupt+0xa/0xf
      [2098682.389689] __mem_cgroup_charge+0x39/0xa0
      [2098682.389692] mem_cgroup_charge+0x2f/0x80
      [2098682.389694] __add_to_page_cache_locked+0x36c/0x3d0
      [2098682.389697] ? scan_shadow_nodes+0x30/0x30
      [2098682.389701] add_to_page_cache_lru+0x4a/0xc0
      [2098682.389703] iomap_readpages_actor+0x103/0x230
      [2098682.389710] iomap_apply+0xfb/0x330
      [2098682.389713] ? iomap_ioend_try_merge+0xf0/0xf0
      [2098682.389716] ? iomap_ioend_try_merge+0xf0/0xf0
      [2098682.389718] iomap_readpages+0xa8/0x1f0
      [2098682.389720] ? iomap_ioend_try_merge+0xf0/0xf0
      [2098682.389723] read_pages+0x6b/0x1a0
      [2098682.389725] __do_page_cache_readahead+0x16f/0x1e0
      [2098682.389728] filemap_fault+0x770/0xa10
      [2098682.389730] ? enqueue_entity+0xf1/0x6f0
      [2098682.389734] ? pmd_devmap_trans_unstable+0x2e/0x40
      [2098682.389736] ? alloc_set_pte+0x1f1/0x3f0
      [2098682.389738] ? _cond_resched+0x15/0x30
      [2098682.389742] __xfs_filemap_fault+0x6d/0x200 [xfs]
      [2098682.389812] __do_fault+0x38/0xc0
      [2098682.389814] handle_pte_fault+0x55d/0x880
      [2098682.389816] __handle_mm_fault+0x453/0x6c0
      [2098682.389819] handle_mm_fault+0xc1/0x1e0
      [2098682.389821] do_user_addr_fault+0x1b9/0x450
      [2098682.389824] do_page_fault+0x37/0x130
      [2098682.389826] ? page_fault+0x8/0x30
      [2098682.389829] page_fault+0x1e/0x30
      [2098682.389831] RIP: 0033:0x45a41c
      2098682.389836] Code: Unable to access opcode bytes at RIP 0x45a3f2.
      [2098682.389837] RSP: 002b:00007f275cef8628 EFLAGS: 00010202
      [2098682.389839] RAX: 0000000001d89c65 RBX: 000000000029325b RCX: 000000000029325b
      [2098682.389840] RDX: 0000000000405560 RSI: 00007f275cef867c RDI: 00007f275cef8690
      [2098682.389841] RBP: 00007f275cef8638 R08: 0000000001d74101 R09: 0000000001d74160
      [2098682.389842] R10: 00000000002a8d60 R11: 0000000000015b05 R12: 0000000000000000
      [2098682.389843] R13: 0000000000000000 R14: 000000c00070e1a0 R15: 0000000000000000
      [2098682.389845] memory: usage 102400kB, limit 102400kB, failcnt 482443
      [2098682.389847] memory+swap: usage 102400kB, limit 9007199254740988kB, failcnt 0
      [2098682.389848] kmem: usage 4952kB, limit 9007199254740988kB, failcnt 0
      [2098682.389849] Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6138e8f2_ca7d_43ef_8d1f_52c74df7fef3.slice:
      [2098682.389858] anon 90083328
                       file 9703424
                       kernel_stack 1212416
                       pagetables 1150976
                       percpu 1333248
                       sock 0
                       shmem 16384
                       file_mapped 0
                       file_dirty 0
                       file_writeback 0
                       swapcached 0
                       anon_thp 0
                       file_thp 0
                       shmem_thp 0
                       inactive_anon 90095616
                       active_anon 4096
                       inactive_file 3678208
                       active_file 0
                       unevictable 0
                       slab_reclaimable 182096
                       slab_unreclaimable 1124112
                       slab 1306208
                       workingset_refault_anon 0
                       workingset_refault_file 1370907
                       workingset_activate_anon 0
                       workingset_activate_file 262909
                       workingset_restore_anon 0
                       workingset_restore_file 88646
                       workingset_nodereclaim 137
                       pgfault 1364565
                       pgmajfault 2986
                       pgrefill 353925
                       pgscan 6063332
                       pgsteal 1385875
                       pgactivate 49504
                       pgdeactivate 312406
                       pglazyfree 0
                       pglazyfreed 0
                       thp_fault_alloc 0
                       thp_collapse_alloc 0
      [2098682.389861] Tasks state (memory values in pages):
      [2098682.389862] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
      [2098682.389864] [2618203] 0 2618203 35965 586 167936 0 -1000 conmon
      [2098682.389869] [2618252] 0 2618252 1517775 20150 1007616 0 999 ip-control-loop
      [2098682.389871] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-ec5d7898e2ef2bb72f63546ef9de8440fc549f9cd788d894fdde3e95ebb7a39c.scope,mems_allowed=0-1,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstabl
      e-pod6138e8f2_ca7d_43ef_8d1f_52c74df7fef3.slice,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6138e8f2_ca7d_43ef_8d1f_52c74df7fef3.slice/crio-ec5d7898e2ef2bb72f63546ef9de8440fc549f9cd788d894fdde3e95ebb7a39c.sco
      pe,task=ip-control-loop,pid=2618252,uid=0
      [2098682.389941] Memory cgroup out of memory: Killed process 2618252 (ip-control-loop) total-vm:6071100kB, anon-rss:80048kB, file-rss:556kB, shmem-rss:0kB, UID:0 pgtables:984kB oom_score_adj:999
      [2098682.395409] oom_reaper: reaped process 2618252 (ip-control-loop), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB 

            nsimha@redhat.com Nikhil Simha (Inactive)
            rhn-support-ugiordan Ugo Giordano
            Weibin Liang Weibin Liang
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: