Uploaded image for project: 'Network Observability'
  1. Network Observability
  2. NETOBSERV-1732

netobserv-ebpf-agent pods are not releasing memory after getting oomkilled.

    • Quality / Stability / Reliability
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • NetObserv - Sprint 255, NetObserv - Sprint 256
    • None
    • None
    • Hide
      Previously, the eBPF agent was unable to clean up tc flows installed before an ungraceful crash (e.g., due to a SIGTERM signal). This led to multiple tc flow filters with the same name being created without removing the older ones. With this fix, we ensure that all previously installed tc flows are cleaned up when the agent starts, before installing new ones.

      Show
      Previously, the eBPF agent was unable to clean up tc flows installed before an ungraceful crash (e.g., due to a SIGTERM signal). This led to multiple tc flow filters with the same name being created without removing the older ones. With this fix, we ensure that all previously installed tc flows are cleaned up when the agent starts, before installing new ones.

      Description of problem:
      In OCP 4.14.23 with nodes having cgroup v1 + network-observability-operator.v1.4.2

      it is seen that netobserv-ebpf-agent pods are not releasing the memory after getting oomkilled , That causes the operator pod to get kill again with no task in the cgroup and saying no killable process found. This happens in loop.

      Steps to Reproduce:

      1.
      2.
      3.
      

      Actual results:
      The Pod got oomkilled without generating any process in the pod. sometime the pod getting stuck in the containerCreatingState .

      Expected results:
      After getting oomkilled the pod should release the memory.

              mmahmoud@redhat.com Mohamed Mahmoud (Inactive)
              rhn-support-manyayad Mahesh Nyayadhish
              None
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              7 Start watching this issue

                Created:
                Updated:
                Resolved: