Uploaded image for project: 'Network Observability'
  1. Network Observability
  2. NETOBSERV-583

ebpf collector CrashLoops with OOMKills under modest load

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • eBPF, FLP, Operator
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • Critical
    • None
    • NetObserv - Sprint 224, NetObserv - Sprint 225, NetObserv - Sprint 226, NetObserv - Sprint 227
    • None
    • None
    • None

      Using the OOTB flowcollector CRD, the ebpf flow collectors repeatedly CrashLoop for reason OOMKill under a very modest network load.

      1. AWS cluster withh 9 m5.2xlarge workers
      2. Install NO with its default 100Mi memory limit in the flowcollector CRD
      3. run the hey-ho app (https://github.com/jotak/hey-ho) with 10 projects, 10 deployments and 1 replica
      4. oc get pods and watch the netobserv-ebpf-agent pods CrashLoop

      The network traffic per node is 20K-300K flows/minute and roughly 200Mb/s spread for 1-2 pods per node.

      We should remove the memory limit for the collector unless we know a correct limit for our target flow and traffic rates.   The OOTB default should not crash so easily.

              jtakvori Joel Takvorian
              mifiedle@redhat.com Mike Fiedler
              None
              None
              None
              Mike Fiedler Mike Fiedler
              None
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: