-
Bug
-
Resolution: Duplicate
-
Critical
-
None
-
4.12
-
No
-
Rejected
-
False
-
-
-
9/27: bumped up to p2; KNIECO-8167
-
Description of problem:
whereabouts-reconciler pods restart occasionally with Reason OOMKilled: Containers: whereabouts: Container ID: cri-o://2ac8ca768989a2e5c941debd20788222e4085be89099169d865672e84f80716c Image: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:78af3f50156b9d8086be236356c881f14d8736fe4efbb73d961919c049565225 Image ID: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:78af3f50156b9d8086be236356c881f14d8736fe4efbb73d961919c049565225 Port: <none> Host Port: <none> Command: /bin/sh Args: -c /usr/src/whereabouts/bin/ip-control-loop -log-level debug State: Running Started: Sun, 06 Aug 2023 06:30:53 +0200 Last State: Terminated Reason: OOMKilled Exit Code: 137 Started: Fri, 14 Jul 2023 06:31:49 +0200 Finished: Sun, 06 Aug 2023 06:30:51 +0200 Ready: True Restart Count: 2
Version-Release number of selected component (if applicable):
How reproducible:
Not reproducible as of now.
Actual results:
The reconciler is restarting due to an OOM.
Expected results:
The reconciler should complete its task without restarting due to an OOM.
Additional info:
Logs seen before the OOM:
2023-08-06T22:26:48.116507835Z 2023-08-06T22:26:48Z [verbose] the NAD's config: {{"cniVersion": "0.3.1", "name": "oam-macvlan", "type": "macvlan", "master": "vlan162", "mtu": 1500, "ipam": {"type": "whereabouts", "range": "IPV6ADDRESS::10-IPV6ADDRESS::ff/64", "gateway": "IPV6ADDRESS::1"}}} 2023-08-06T22:26:48.116566269Z 2023-08-06T22:26:48Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf 2023-08-06T22:26:48.116575416Z 2023-08-06T22:26:48Z [verbose] result of garbage collecting pods: failed to get the IPPool data: ippool.whereabouts.cni.cncf.io "ADDRESS" not found 2023-08-06T22:26:48.116581151Z 2023-08-06T22:26:48Z [verbose] re-queuing IP address reconciliation request for pod NAMESPACE/PODNAME; retry #: 2 2023-08-06T22:26:48.136767743Z 2023-08-06T22:26:48Z [verbose] skipped net-attach-def for default network 2023-08-06T22:26:48.136767743Z 2023-08-06T22:26:48Z [debug] pod's network status: {Name:default/oam-macvlan Interface:port1 IPs:[IPV6ADDRESS::18] Mac:22:6d:06:15:7d:76 Default:false DNS:{Nameservers:[] Domain: Search:[] Options:[]} DeviceInfo:<nil>} 2023-08-06T22:26:48.136774755Z 2023-08-06T22:26:48Z [verbose] the NAD's config: {{"cniVersion": "0.3.1", "name": "oam-macvlan", "type": "macvlan", "master": "vlan162", "mtu": 1500, "ipam": {"type": "whereabouts", "range": "IPV6ADDRESS::10-IPV6ADDRESS::ff/64", "gateway": "IPV6ADDRESS::1"}}} 2023-08-06T22:26:48.136822543Z 2023-08-06T22:26:48Z [debug] Used defaults from parsed flat file config @ /host/etc/cni/net.d/whereabouts.d/whereabouts.conf 2023-08-06T22:26:48.136831782Z 2023-08-06T22:26:48Z [verbose] result of garbage collecting pods: failed to get the IPPool data: ippool.whereabouts.cni.cncf.io "ADDRESS" not found 2023-08-06T22:26:48.136836213Z 2023-08-06T22:26:48Z [error] dropping pod [NAMESPACE/PODNAME] deletion out of the queue - could not reconcile IP: failed to get the IPPool data: ippool.whereabouts.cni.cncf.io "ADDRESS" not found 2023-08-06T22:26:48.137287757Z 2023-08-06T22:26:48Z [verbose] Event(v1.ObjectReference{Kind:"Pod", Namespace:"NAMESPACE", Name:"PODNAME", UID:"08644f94-6537-41de-803a-07a10e6e2aaf", APIVersion:"v1", ResourceVersion:"32801811", FieldPath:""}): type: 'Warning' reason: 'IPAddressGarbageCollectionFailed' failed to garbage collect addresses for pod NAMESPACE/PODNAME
dmesg logs:
[2098682.389649] ip-control-loop invoked oom-killer: gfp_mask=0x600040(GFP_NOFS), order=0, oom_score_adj=999 [2098682.389654] CPU: 2 PID: 2618506 Comm: ip-control-loop Kdump: loaded Not tainted 4.18.0-372.59.1.el8_6.x86_64 #1 [2098682.389657] Hardware name: Dell Inc. PowerEdge R650/0PYXKY, BIOS 1.9.2 11/17/2022 [2098682.389658] Call Trace: [2098682.389662] dump_stack+0x41/0x60 [2098682.389667] dump_header+0x4a/0x1df [2098682.389673] oom_kill_process.cold.32+0xb/0x10 [2098682.389676] out_of_memory+0x1bd/0x4e0 [2098682.389679] mem_cgroup_out_of_memory+0xec/0x100 [2098682.389683] try_charge+0x64f/0x690 [2098682.389686] ? common_interrupt+0xa/0xf [2098682.389689] __mem_cgroup_charge+0x39/0xa0 [2098682.389692] mem_cgroup_charge+0x2f/0x80 [2098682.389694] __add_to_page_cache_locked+0x36c/0x3d0 [2098682.389697] ? scan_shadow_nodes+0x30/0x30 [2098682.389701] add_to_page_cache_lru+0x4a/0xc0 [2098682.389703] iomap_readpages_actor+0x103/0x230 [2098682.389710] iomap_apply+0xfb/0x330 [2098682.389713] ? iomap_ioend_try_merge+0xf0/0xf0 [2098682.389716] ? iomap_ioend_try_merge+0xf0/0xf0 [2098682.389718] iomap_readpages+0xa8/0x1f0 [2098682.389720] ? iomap_ioend_try_merge+0xf0/0xf0 [2098682.389723] read_pages+0x6b/0x1a0 [2098682.389725] __do_page_cache_readahead+0x16f/0x1e0 [2098682.389728] filemap_fault+0x770/0xa10 [2098682.389730] ? enqueue_entity+0xf1/0x6f0 [2098682.389734] ? pmd_devmap_trans_unstable+0x2e/0x40 [2098682.389736] ? alloc_set_pte+0x1f1/0x3f0 [2098682.389738] ? _cond_resched+0x15/0x30 [2098682.389742] __xfs_filemap_fault+0x6d/0x200 [xfs] [2098682.389812] __do_fault+0x38/0xc0 [2098682.389814] handle_pte_fault+0x55d/0x880 [2098682.389816] __handle_mm_fault+0x453/0x6c0 [2098682.389819] handle_mm_fault+0xc1/0x1e0 [2098682.389821] do_user_addr_fault+0x1b9/0x450 [2098682.389824] do_page_fault+0x37/0x130 [2098682.389826] ? page_fault+0x8/0x30 [2098682.389829] page_fault+0x1e/0x30 [2098682.389831] RIP: 0033:0x45a41c 2098682.389836] Code: Unable to access opcode bytes at RIP 0x45a3f2. [2098682.389837] RSP: 002b:00007f275cef8628 EFLAGS: 00010202 [2098682.389839] RAX: 0000000001d89c65 RBX: 000000000029325b RCX: 000000000029325b [2098682.389840] RDX: 0000000000405560 RSI: 00007f275cef867c RDI: 00007f275cef8690 [2098682.389841] RBP: 00007f275cef8638 R08: 0000000001d74101 R09: 0000000001d74160 [2098682.389842] R10: 00000000002a8d60 R11: 0000000000015b05 R12: 0000000000000000 [2098682.389843] R13: 0000000000000000 R14: 000000c00070e1a0 R15: 0000000000000000 [2098682.389845] memory: usage 102400kB, limit 102400kB, failcnt 482443 [2098682.389847] memory+swap: usage 102400kB, limit 9007199254740988kB, failcnt 0 [2098682.389848] kmem: usage 4952kB, limit 9007199254740988kB, failcnt 0 [2098682.389849] Memory cgroup stats for /kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6138e8f2_ca7d_43ef_8d1f_52c74df7fef3.slice: [2098682.389858] anon 90083328 file 9703424 kernel_stack 1212416 pagetables 1150976 percpu 1333248 sock 0 shmem 16384 file_mapped 0 file_dirty 0 file_writeback 0 swapcached 0 anon_thp 0 file_thp 0 shmem_thp 0 inactive_anon 90095616 active_anon 4096 inactive_file 3678208 active_file 0 unevictable 0 slab_reclaimable 182096 slab_unreclaimable 1124112 slab 1306208 workingset_refault_anon 0 workingset_refault_file 1370907 workingset_activate_anon 0 workingset_activate_file 262909 workingset_restore_anon 0 workingset_restore_file 88646 workingset_nodereclaim 137 pgfault 1364565 pgmajfault 2986 pgrefill 353925 pgscan 6063332 pgsteal 1385875 pgactivate 49504 pgdeactivate 312406 pglazyfree 0 pglazyfreed 0 thp_fault_alloc 0 thp_collapse_alloc 0 [2098682.389861] Tasks state (memory values in pages): [2098682.389862] [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name [2098682.389864] [2618203] 0 2618203 35965 586 167936 0 -1000 conmon [2098682.389869] [2618252] 0 2618252 1517775 20150 1007616 0 999 ip-control-loop [2098682.389871] oom-kill:constraint=CONSTRAINT_MEMCG,nodemask=(null),cpuset=crio-ec5d7898e2ef2bb72f63546ef9de8440fc549f9cd788d894fdde3e95ebb7a39c.scope,mems_allowed=0-1,oom_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstabl e-pod6138e8f2_ca7d_43ef_8d1f_52c74df7fef3.slice,task_memcg=/kubepods.slice/kubepods-burstable.slice/kubepods-burstable-pod6138e8f2_ca7d_43ef_8d1f_52c74df7fef3.slice/crio-ec5d7898e2ef2bb72f63546ef9de8440fc549f9cd788d894fdde3e95ebb7a39c.sco pe,task=ip-control-loop,pid=2618252,uid=0 [2098682.389941] Memory cgroup out of memory: Killed process 2618252 (ip-control-loop) total-vm:6071100kB, anon-rss:80048kB, file-rss:556kB, shmem-rss:0kB, UID:0 pgtables:984kB oom_score_adj:999 [2098682.395409] oom_reaper: reaped process 2618252 (ip-control-loop), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
- is duplicated by
-
OCPBUGS-19830 conformance tests failing due to openshift-multus config
- Closed