Uploaded image for project: 'Network Observability'
  1. Network Observability
  2. NETOBSERV-613

netobserv-ebpf-agent pod rss memory usage grows unbounded, never resets after workload

    • Icon: Bug Bug
    • Resolution: Done
    • Icon: Major Major
    • None
    • openshift-4.12
    • eBPF, FLP, Kafka, Operator
    • None
    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • 3
    • Critical
    • None
    • NetObserv - Sprint 225, NetObserv - Sprint 226, NetObserv - Sprint 227
    • Customer Facing
    • None
    • None
    • None

      With latest (quay.io/netobserv/netobserv-ebpf-agent:main as of 30-Sept) ebpf agent , observing that memory growth is unbounded under periodic load with idle periods.

      • flowcollector sampling parameter is 100 (problem happens even faster with lower values)
      • workload is a k6 (k6.io) pod simulating 100 users sending 25K http requests/second to 5 services running in OpenShift.   The ebpf pod where memory is being monitored is on the node where the requests originate, not the service nodes
      • Workload is run for 10 minute intervals with 5 minute rest periods

      The graph of pod rss memory usage is attached.   If the test is left to run the pod will be OOMKilled when the pod memory limit is reached, or when system memory is exhausted if the memory limit is removed.   On a 16G system, the pod is OOMKilled when it uses ~14GB. 

        1. ebpf_pod_mem_growth.png
          19 kB
          Mike Fiedler
        2. image-2022-11-01-14-33-27-298.png
          38 kB
          Mike Fiedler

              mmaciasl@redhat.com Mario Macias (Inactive)
              mifiedle@redhat.com Mike Fiedler
              None
              None
              None
              Mike Fiedler Mike Fiedler
              None
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: