Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-59499

nmstate-handler pods on worker nodes are frequently being OOMKilled

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • 4.20.0, 4.20
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Important
    • None
    • Unspecified
    • None
    • None
    • Proposed
    • None
    • None
    • None
    • None
    • None
    • None
    • None
    • None

      Description of problem:

      The nmstate-handler pods (NMState Operator v4.19.0) on worker nodes are frequently being OOMKilled. The pods have a 128MiB memory limit, but the worker nodes have a complex network setup with over 250 interfaces (9 VLANs, 6 physical NICs, and 252 SR-IOV VFs). If this memory usage is expected given the high interface count, the pod's memory limit might need be increased.

      Version-Release number of selected component (if applicable):

      4.20.0-ec.3    nmstate operator v4.19.0

      How reproducible:

      always    

      Steps to Reproduce:

      Initially, pod memory usage was within the defined limit. However, after nodes appworker-0 and appworker-2 were rebooted, memory consumption increased, causing the pods on these nodes to be frequently OOMKilled.
      
      $ oc get pod -n openshift-nmstate -l name=nmstate-handler -o wide|grep worker
      nmstate-handler-6wl5g   1/1     Running   9 (6m24s ago)   16h   192.168.94.20   appworker-2.blueprint-cwl.nokia.core.bos2.lab   <none>           <none>
      nmstate-handler-hq2xf   1/1     Running   16 (4m ago)     16h   192.168.94.18   appworker-0.blueprint-cwl.nokia.core.bos2.lab   <none>           <none>
      nmstate-handler-p8tdt   1/1     Running   0               16h   192.168.94.19   appworker-1.blueprint-cwl.nokia.core.bos2.lab   <none>           <none>
      nmstate-handler-z245t   1/1     Running   0               16h   192.168.94.21   appworker-3.blueprint-cwl.nokia.core.bos2.lab   <none>           <none>
      
      $ oc adm top pod -n openshift-nmstate -l name=nmstate-handler
      NAME                    CPU(cores)   MEMORY(bytes)
      nmstate-handler-44dx2   18m          29Mi
      nmstate-handler-5n2vc   11m          77Mi
      nmstate-handler-6wl5g   485m         121Mi
      nmstate-handler-6x6z8   9m           37Mi
      nmstate-handler-ds5p8   1m           39Mi
      nmstate-handler-gtm82   1m           40Mi
      nmstate-handler-hq2xf   500m         112Mi
      nmstate-handler-p8tdt   80m          72Mi
      nmstate-handler-rs6ph   1m           33Mi
      nmstate-handler-vm2rw   12m          54Mi
      nmstate-handler-x76dq   1m           48Mi
      nmstate-handler-xnjkc   1m           26Mi
      nmstate-handler-z245t   1m           59Mi
             

              mkowalsk@redhat.com Mat Kowalski
              rh-ee-junyang Laura Yang
              None
              None
              Ross Brattain Ross Brattain
              None
              Votes:
              3 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: