Uploaded image for project: 'Network Observability'
  1. Network Observability
  2. NETOBSERV-1103

eBPF agent pods consuming more memory than set resource limit OCP ppc64le cluster

    • Icon: Bug Bug
    • Resolution: Duplicate
    • Icon: Normal Normal
    • None
    • netobserv-1.3
    • Operator
    • None
    • False
    • None
    • False
    • Hide

      This issue can be reproduced on any 4.10 and above OpenShift cluster supported ppc64le architecture by installing latest build Network Observability operator v1.3.0. 

      Show
      This issue can be reproduced on any 4.10 and above OpenShift cluster supported ppc64le architecture by installing latest build Network Observability operator v1.3.0. 
    • NetObserv - Sprint 240, NetObserv - Sprint 241, NetObserv - Sprint 242

      This issue has been observed while running the Network Observability operator testcases on OCP 4.12 ppc64le cluster with the operator version v1.3.0-69 nightly build. After initializing the ePBF pods running on higher compute resources nodes went to OOMKill and then into CrashLoopBackOff state. 

       

      The pods got recovered after increasing the default memory limit from 800Mi to 10000Mi and request limit per pod from 50Mi to 400Mi.

       

      Below are the cluster details:

      OCP Version-

      Client Version: 4.12.22
      Kustomize Version: v4.5.7
      Server Version: 4.12.22
      Kubernetes Version: v1.25.10+8c21020

      Arch- ppc64le

       

      Cluster resource configuration:

      1 bastion- 1vCPU, 8GB Memory

      3 masters- 1 vCPU, 32GB Memory

      2 workers- 4vCPU, 84GB Memory

       

      Crashed eBPF pods:

       

      [root@rdr-ah-412-syd05-bastion-0 ~]# oc get po -n netobserv-privileged
      NAME                         READY   STATUS             RESTARTS       AGE
      netobserv-ebpf-agent-6f85c   1/1     Running            0              57m
      netobserv-ebpf-agent-8mr8r   1/1     Running            0              57m
      netobserv-ebpf-agent-btxsl   0/1     CrashLoopBackOff   16 (30s ago)   57m
      netobserv-ebpf-agent-qrgj2   1/1     Running            0              57m
      netobserv-ebpf-agent-sp4jn   0/1     CrashLoopBackOff   16 (35s ago)   57m
      

       

      Pod log output:

       

      [root@rdr-ah-412-syd05-bastion-0 ~]# oc logs netobserv-ebpf-agent-btxsl
      time="2023-06-20T11:37:14Z" level=info msg="starting NetObserv eBPF Agent"
      time="2023-06-20T11:37:14Z" level=info msg="initializing Flows agent" component=agent.Flows
       
      

      Pod describe output:

       

      [root@rdr-ah-412-syd05-bastion-0 ~]# oc describe po netobserv-ebpf-agent-btxsl
      Name:             netobserv-ebpf-agent-btxsl
      Namespace:        netobserv-privileged
      Priority:         0
      Service Account:  netobserv-ebpf-agent
      Node:             syd05-worker-1.rdr-ah-412.ibm.com/193.168.200.116
      Start Time:       Tue, 20 Jun 2023 06:40:06 -0400
      Labels:           app=netobserv-ebpf-agent
                        controller-revision-hash=7d58d84bcc
                        pod-template-generation=1
      Annotations:      openshift.io/scc: netobserv-ebpf-agent
      Status:           Running
      IP:               193.168.200.116
      IPs:
        IP:           193.168.200.116
      Controlled By:  DaemonSet/netobserv-ebpf-agent
      Containers:
        netobserv-ebpf-agent:
          Container ID:   cri-o://4187bfc85bd230009d3353da76d1966a0ed47cf12ce084c86ad7ec5d243db2c3
          Image:          registry.redhat.io/network-observability/network-observability-ebpf-agent-rhel9@sha256:46cfe6810344f2fb393d9ac48ce7359e77fefa1975c969577f88362e40881f98
          Image ID:       registry.redhat.io/network-observability/network-observability-ebpf-agent-rhel9@sha256:46cfe6810344f2fb393d9ac48ce7359e77fefa1975c969577f88362e40881f98
          Port:           <none>
          Host Port:      <none>
          State:          Waiting
            Reason:       CrashLoopBackOff
          Last State:     Terminated
            Reason:       OOMKilled
            Exit Code:    137
            Started:      Tue, 20 Jun 2023 07:37:14 -0400
            Finished:     Tue, 20 Jun 2023 07:37:15 -0400
          Ready:          False
          Restart Count:  16
          Limits:
            memory:  800Mi
          Requests:
            cpu:     100m
            memory:  50Mi
          Environment:
            CACHE_ACTIVE_TIMEOUT:  5s
            CACHE_MAX_FLOWS:       100000
            LOG_LEVEL:             info
            EXCLUDE_INTERFACES:    lo
            SAMPLING:              50
            DEDUPER:               firstCome
            DEDUPER_JUST_MARK:     true
            EXPORT:                grpc
            FLOWS_TARGET_HOST:      (v1:status.hostIP)
            FLOWS_TARGET_PORT:     2055
          Mounts:
            /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vwvrh (ro)
      Conditions:
        Type              Status
        Initialized       True
        Ready             False
        ContainersReady   False
        PodScheduled      True
      Volumes:
        kube-api-access-vwvrh:
          Type:                    Projected (a volume that contains injected data from multiple sources)
          TokenExpirationSeconds:  3607
          ConfigMapName:           kube-root-ca.crt
          ConfigMapOptional:       <nil>
          DownwardAPI:             true
          ConfigMapName:           openshift-service-ca.crt
          ConfigMapOptional:       <nil>
      QoS Class:                   Burstable
      Node-Selectors:              <none>
      Tolerations:                 op=Exists
      Events:
        Type     Reason     Age                    From               Message
        ----     ------     ----                   ----               -------
        Normal   Scheduled  59m                    default-scheduler  Successfully assigned netobserv-privileged/netobserv-ebpf-agent-btxsl to syd05-worker-1.rdr-ah-412.ibm.com
        Normal   Pulled     57m (x5 over 59m)      kubelet            Container image "registry.redhat.io/network-observability/network-observability-ebpf-agent-rhel9@sha256:46cfe6810344f2fb393d9ac48ce7359e77fefa1975c969577f88362e40881f98" already present on machine
        Normal   Created    57m (x5 over 59m)      kubelet            Created container netobserv-ebpf-agent
        Normal   Started    57m (x5 over 59m)      kubelet            Started container netobserv-ebpf-agent
        Warning  BackOff    4m10s (x255 over 59m)  kubelet            Back-off restarting failed container
      

       

        1. image (13).png
          image (13).png
          172 kB
        2. image (14).png
          image (14).png
          246 kB
        3. journalctl.txt
          21 kB
        4. ocp412-must-gather-20230620.tar.gz
          47.88 MB
        5. power_debugging.tar.gz
          514 kB

            mmahmoud@redhat.com Mohamed Mahmoud
            rh-ee-ahonkala Aditya Honkalas
            Aditya Honkalas Aditya Honkalas
            Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

              Created:
              Updated:
              Resolved: