Uploaded image for project: 'Network Observability'
  1. Network Observability
  2. NETOBSERV-1107

Improve ebpf agent memory usage

    • Icon: Story Story
    • Resolution: Done
    • Icon: Critical Critical
    • netobserv-1.4
    • None
    • eBPF
    • None
    • Proactive Architecture
    • False
    • None
    • False
    • NetObserv - Sprint 238, NetObserv - Sprint 239, NetObserv - Sprint 240, NetObserv - Sprint 241

      while profiling the memory it was clear the userspace  allocate a huge map to hold the entire ebpf hash table every 5s and leave it to go garbage collection to free 
      top5
      Showing nodes accounting for 95.46GB, 97.45% of 97.96GB total
      Dropped 215 nodes (cum <= 0.49GB)
      Showing top 5 nodes out of 17
      flat flat% sum% cum cum%
      40.85GB 41.70% 41.70% 96.11GB 98.12% github.com/netobserv/netobserv-ebpf-agent/pkg/ebpf.(*FlowFetcher).LookupAndDeleteMap <<<<<<<<<<<<<<
      16.05GB 16.39% 58.09% 16.05GB 16.39% reflect.unsafe_NewArray
      12.89GB 13.16% 71.25% 12.89GB 13.16% encoding/binary.Read
      12.85GB 13.12% 84.36% 41.75GB 42.63% github.com/cilium/ebpf.unmarshalPerCPUValue
      12.82GB 13.09% 97.45% 12.82GB 13.09% github.com/cilium/ebpf.makeBuffer (inline)
      its know golang map even when its deleted it won't free the memory it allocates

      so in this user story we propose the following 

      1- change the data type used by the map instead of the entire mertics structure use ptr to metircs that will reduce the map size

      2- force GC at the end of every collection we are expecting this will increase the CPU slightly but in favour of returning back lagr chunk of allocated memory

      3- add GOMEMLIMIT env setting which will trigger GC aggresively when resource limit is reached to avoid OOM conditions

              mmahmoud@redhat.com Mohamed Mahmoud
              mmahmoud@redhat.com Mohamed Mahmoud
              Nathan Weinberg Nathan Weinberg
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated:
                Resolved: