-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
4.20.0, 4.20
-
None
-
Quality / Stability / Reliability
-
False
-
-
None
-
Important
-
None
-
Unspecified
-
None
-
None
-
Proposed
-
None
-
None
-
None
-
None
-
None
-
None
-
None
-
None
Description of problem:
The nmstate-handler pods (NMState Operator v4.19.0) on worker nodes are frequently being OOMKilled. The pods have a 128MiB memory limit, but the worker nodes have a complex network setup with over 250 interfaces (9 VLANs, 6 physical NICs, and 252 SR-IOV VFs). If this memory usage is expected given the high interface count, the pod's memory limit might need be increased.
Version-Release number of selected component (if applicable):
4.20.0-ec.3 nmstate operator v4.19.0
How reproducible:
always
Steps to Reproduce:
Initially, pod memory usage was within the defined limit. However, after nodes appworker-0 and appworker-2 were rebooted, memory consumption increased, causing the pods on these nodes to be frequently OOMKilled. $ oc get pod -n openshift-nmstate -l name=nmstate-handler -o wide|grep worker nmstate-handler-6wl5g 1/1 Running 9 (6m24s ago) 16h 192.168.94.20 appworker-2.blueprint-cwl.nokia.core.bos2.lab <none> <none> nmstate-handler-hq2xf 1/1 Running 16 (4m ago) 16h 192.168.94.18 appworker-0.blueprint-cwl.nokia.core.bos2.lab <none> <none> nmstate-handler-p8tdt 1/1 Running 0 16h 192.168.94.19 appworker-1.blueprint-cwl.nokia.core.bos2.lab <none> <none> nmstate-handler-z245t 1/1 Running 0 16h 192.168.94.21 appworker-3.blueprint-cwl.nokia.core.bos2.lab <none> <none> $ oc adm top pod -n openshift-nmstate -l name=nmstate-handler NAME CPU(cores) MEMORY(bytes) nmstate-handler-44dx2 18m 29Mi nmstate-handler-5n2vc 11m 77Mi nmstate-handler-6wl5g 485m 121Mi nmstate-handler-6x6z8 9m 37Mi nmstate-handler-ds5p8 1m 39Mi nmstate-handler-gtm82 1m 40Mi nmstate-handler-hq2xf 500m 112Mi nmstate-handler-p8tdt 80m 72Mi nmstate-handler-rs6ph 1m 33Mi nmstate-handler-vm2rw 12m 54Mi nmstate-handler-x76dq 1m 48Mi nmstate-handler-xnjkc 1m 26Mi nmstate-handler-z245t 1m 59Mi
- links to