-
Bug
-
Resolution: Done
-
Undefined
-
None
-
None
-
False
-
None
-
False
-
Needs release note text.
-
Bug Fix
-
-
-
NetObserv - Sprint 252, NetObserv - Sprint 253
Description of problem:
Recently found an issue for the default port assignment to the eBPF pods while running the automated test for verifying the eBPF agent metrics, health and console metrics. Currently, this issue is reproducing just on OCP 4.13 cluster with ppc64le arch. Due to the unavailable port the eBPF pods in the "-privileged" namespace are getting into the CrashLoopBackOff state as below: [root@rdr-noo-ocp-413-bastion-0 ~]# oc -n e2e-test-netobserv-d2jg4-privileged get po -o wide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES netobserv-ebpf-agent-27kx8 1/1 Running 0 58s 10.20.186.124 worker-1 <none> <none> netobserv-ebpf-agent-2dsbj 0/1 CrashLoopBackOff 3 (17s ago) 58s 10.20.186.120 master-2 <none> <none> netobserv-ebpf-agent-6nzpc 1/1 Running 0 58s 10.20.186.236 worker-0 <none> <none> netobserv-ebpf-agent-mrm7z 0/1 Error 3 (39s ago) 58s 10.20.186.183 master-0 <none> <none> netobserv-ebpf-agent-n4gwd 0/1 Error 3 (38s ago) 58s 10.20.186.79 master-1 <none> <none>
Steps to Reproduce:
As mentioned above the, this issue is being reproducing for on 4.13 cluster with ppc64le arch only, while running the automated test for verifying flowlogs-pipeline, eBPF agent, health and console metrics. Can be reproduced with the below steps: 1. Deploy OCP 4.13 cluster for ppc64le arch 2. Run the automated test- "54043-High-66031-High-72959-Verify flowlogs-pipeline, eBPF agent and Console metrics"
Actual results:
The eBPF pods are getting into the CrashLoopBackOff state due to port unavailability.
Expected results:
eBPF pods should get scheduled with the default defined port in the flowcollector deployment.
eBPF pod logs:
[root@rdr-noo-ocp-413-bastion-0 ~]# oc -n e2e-test-netobserv-cxv4l-privileged logs netobserv-ebpf-agent-7xdw2 time="2024-04-24T09:36:39Z" level=info msg="starting NetObserv eBPF Agent" time="2024-04-24T09:36:39Z" level=info msg="initializing Flows agent" component=agent.Flows time="2024-04-24T09:36:39Z" level=info msg="StartServerAsync: addr = :9102" component=prometheus time="2024-04-24T09:36:39Z" level=info msg="push CTRL+C or send SIGTERM to interrupt execution" time="2024-04-24T09:36:39Z" level=info msg="starting Flows agent" component=agent.Flows time="2024-04-24T09:36:39Z" level=warning msg="can't detect any network-namespaces err: open /var/run/netns: no such file or directory [Ignore if the agent privileged flag is not set]" component=ifaces.Watcher time="2024-04-24T09:36:39Z" level=warning msg="failed to add watcher to netns directory err: no such file or directory [Ignore if the agent privileged flag is not set]" component=ifaces.Watcher time="2024-04-24T09:36:39Z" level=info msg="Flows agent successfully started" component=agent.Flows time="2024-04-24T09:36:39Z" level=fatal msg="error in http.ListenAndServe: listen tcp :9102: bind: address already in use" component=prometheus
process attached to the port:
root 3393866 3393742 0 17:38 pts/0 00:00:00 grep --color=auto 3543
[root@master-2 core]# netstat -plano | grep :9102
tcp6 0 0 :::9102 :::* LISTEN 3543/kube-rbac-prox off (0.00/0/0)
tcp6 0 0 10.20.186.120:9102 10.20.186.236:57840 ESTABLISHED 3543/kube-rbac-prox keepalive (11.40/0/0)
tcp6 0 0 10.20.186.120:9102 10.20.186.124:45198 ESTABLISHED 3543/kube-rbac-prox keepalive (1.17/0/0)
[root@master-2 core]# ps -ef | grep 3543
nfsnobo+ 3543 3423 0 Apr19 ? 00:02:13 /usr/bin/kube-rbac-proxy --logtostderr --secure-listen-address=:9102 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 --upstream=http://127.0.0.1:29102/ --tls-private-key-file=/etc/pki/tls/metrics-cert/tls.key --tls-cert-file=/etc/pki/tls/metrics-cert/tls.crt
root 3394891 3393742 0 17:39 pts/0 00:00:00 grep --color=auto 3543
[root@master-2 core]#