-
Bug
-
Resolution: Duplicate
-
Normal
-
None
-
netobserv-1.3
-
None
-
False
-
None
-
False
-
-
-
-
NetObserv - Sprint 240, NetObserv - Sprint 241, NetObserv - Sprint 242
This issue has been observed while running the Network Observability operator testcases on OCP 4.12 ppc64le cluster with the operator version v1.3.0-69 nightly build. After initializing the ePBF pods running on higher compute resources nodes went to OOMKill and then into CrashLoopBackOff state.
The pods got recovered after increasing the default memory limit from 800Mi to 10000Mi and request limit per pod from 50Mi to 400Mi.
Below are the cluster details:
OCP Version-
Client Version: 4.12.22 Kustomize Version: v4.5.7 Server Version: 4.12.22 Kubernetes Version: v1.25.10+8c21020
Arch- ppc64le
Cluster resource configuration:
1 bastion- 1vCPU, 8GB Memory
3 masters- 1 vCPU, 32GB Memory
2 workers- 4vCPU, 84GB Memory
Crashed eBPF pods:
[root@rdr-ah-412-syd05-bastion-0 ~]# oc get po -n netobserv-privileged NAME READY STATUS RESTARTS AGE netobserv-ebpf-agent-6f85c 1/1 Running 0 57m netobserv-ebpf-agent-8mr8r 1/1 Running 0 57m netobserv-ebpf-agent-btxsl 0/1 CrashLoopBackOff 16 (30s ago) 57m netobserv-ebpf-agent-qrgj2 1/1 Running 0 57m netobserv-ebpf-agent-sp4jn 0/1 CrashLoopBackOff 16 (35s ago) 57m
Pod log output:
[root@rdr-ah-412-syd05-bastion-0 ~]# oc logs netobserv-ebpf-agent-btxsl time="2023-06-20T11:37:14Z" level=info msg="starting NetObserv eBPF Agent" time="2023-06-20T11:37:14Z" level=info msg="initializing Flows agent" component=agent.Flows
Pod describe output:
[root@rdr-ah-412-syd05-bastion-0 ~]# oc describe po netobserv-ebpf-agent-btxsl Name: netobserv-ebpf-agent-btxsl Namespace: netobserv-privileged Priority: 0 Service Account: netobserv-ebpf-agent Node: syd05-worker-1.rdr-ah-412.ibm.com/193.168.200.116 Start Time: Tue, 20 Jun 2023 06:40:06 -0400 Labels: app=netobserv-ebpf-agent controller-revision-hash=7d58d84bcc pod-template-generation=1 Annotations: openshift.io/scc: netobserv-ebpf-agent Status: Running IP: 193.168.200.116 IPs: IP: 193.168.200.116 Controlled By: DaemonSet/netobserv-ebpf-agent Containers: netobserv-ebpf-agent: Container ID: cri-o://4187bfc85bd230009d3353da76d1966a0ed47cf12ce084c86ad7ec5d243db2c3 Image: registry.redhat.io/network-observability/network-observability-ebpf-agent-rhel9@sha256:46cfe6810344f2fb393d9ac48ce7359e77fefa1975c969577f88362e40881f98 Image ID: registry.redhat.io/network-observability/network-observability-ebpf-agent-rhel9@sha256:46cfe6810344f2fb393d9ac48ce7359e77fefa1975c969577f88362e40881f98 Port: <none> Host Port: <none> State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: OOMKilled Exit Code: 137 Started: Tue, 20 Jun 2023 07:37:14 -0400 Finished: Tue, 20 Jun 2023 07:37:15 -0400 Ready: False Restart Count: 16 Limits: memory: 800Mi Requests: cpu: 100m memory: 50Mi Environment: CACHE_ACTIVE_TIMEOUT: 5s CACHE_MAX_FLOWS: 100000 LOG_LEVEL: info EXCLUDE_INTERFACES: lo SAMPLING: 50 DEDUPER: firstCome DEDUPER_JUST_MARK: true EXPORT: grpc FLOWS_TARGET_HOST: (v1:status.hostIP) FLOWS_TARGET_PORT: 2055 Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vwvrh (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: kube-api-access-vwvrh: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: Burstable Node-Selectors: <none> Tolerations: op=Exists Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 59m default-scheduler Successfully assigned netobserv-privileged/netobserv-ebpf-agent-btxsl to syd05-worker-1.rdr-ah-412.ibm.com Normal Pulled 57m (x5 over 59m) kubelet Container image "registry.redhat.io/network-observability/network-observability-ebpf-agent-rhel9@sha256:46cfe6810344f2fb393d9ac48ce7359e77fefa1975c969577f88362e40881f98" already present on machine Normal Created 57m (x5 over 59m) kubelet Created container netobserv-ebpf-agent Normal Started 57m (x5 over 59m) kubelet Started container netobserv-ebpf-agent Warning BackOff 4m10s (x255 over 59m) kubelet Back-off restarting failed container
- blocks
-
NETOBSERV-576 Multi-arch builds - amd64, ppc64le, arm64
- Closed