Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Undefined
Fix Version/s: netobserv-1.6
Affects Version/s: None
Component/s: eBPF
Labels:
None

Blocked:
False
Blocked Reason:
None
Ready:
False
Release Note Text:
Needs release note text.
Release Note Type:
Bug Fix
Intelligence Requested:
Market:

Sprint:
NetObserv - Sprint 252, NetObserv - Sprint 253

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Description of problem:

Recently found an issue for the default port assignment to the eBPF pods while running the automated test for verifying the eBPF agent metrics, health and console metrics. Currently, this issue is reproducing just on OCP 4.13 cluster with ppc64le arch. Due to the unavailable port the eBPF pods in the "-privileged" namespace are getting into the CrashLoopBackOff state as below:

[root@rdr-noo-ocp-413-bastion-0 ~]# oc -n e2e-test-netobserv-d2jg4-privileged get po -o wide
NAME                         READY   STATUS             RESTARTS      AGE   IP              NODE       NOMINATED NODE   READINESS GATES
netobserv-ebpf-agent-27kx8   1/1     Running            0             58s   10.20.186.124   worker-1   <none>           <none>
netobserv-ebpf-agent-2dsbj   0/1     CrashLoopBackOff   3 (17s ago)   58s   10.20.186.120   master-2   <none>           <none>
netobserv-ebpf-agent-6nzpc   1/1     Running            0             58s   10.20.186.236   worker-0   <none>           <none>
netobserv-ebpf-agent-mrm7z   0/1     Error              3 (39s ago)   58s   10.20.186.183   master-0   <none>           <none>
netobserv-ebpf-agent-n4gwd   0/1     Error              3 (38s ago)   58s   10.20.186.79    master-1   <none>           <none>

Steps to Reproduce:

As mentioned above the, this issue is being reproducing for on 4.13 cluster with ppc64le arch only, while running the automated test for verifying flowlogs-pipeline, eBPF agent, health and console metrics. Can be reproduced with the below steps:

1. Deploy OCP 4.13 cluster for ppc64le arch
2. Run the automated test- "54043-High-66031-High-72959-Verify flowlogs-pipeline, eBPF agent and Console metrics"

Actual results:

The eBPF pods are getting into the CrashLoopBackOff state due to port unavailability.

Expected results:

eBPF pods should get scheduled with the default defined port in the flowcollector deployment.

eBPF pod logs:

[root@rdr-noo-ocp-413-bastion-0
~]# oc -n e2e-test-netobserv-cxv4l-privileged logs netobserv-ebpf-agent-7xdw2

time="2024-04-24T09:36:39Z"
level=info msg="starting NetObserv eBPF Agent"

time="2024-04-24T09:36:39Z"
level=info msg="initializing Flows agent" component=agent.Flows

time="2024-04-24T09:36:39Z"
level=info msg="StartServerAsync: addr = :9102" component=prometheus

time="2024-04-24T09:36:39Z"
level=info msg="push CTRL+C or send SIGTERM to interrupt execution"

time="2024-04-24T09:36:39Z"
level=info msg="starting Flows agent" component=agent.Flows

time="2024-04-24T09:36:39Z"
level=warning msg="can't detect any network-namespaces err: open
/var/run/netns: no such file or directory [Ignore if the agent privileged flag
is not set]" component=ifaces.Watcher

time="2024-04-24T09:36:39Z"
level=warning msg="failed to add watcher to netns directory err: no such
file or directory [Ignore if the agent privileged flag is not set]"
component=ifaces.Watcher

time="2024-04-24T09:36:39Z"
level=info msg="Flows agent successfully started"
component=agent.Flows

time="2024-04-24T09:36:39Z"
level=fatal msg="error in http.ListenAndServe: listen tcp :9102: bind:
address already in use" component=prometheus

process attached to the port:

root     3393866 3393742  0 17:38 pts/0    00:00:00 grep --color=auto 3543
[root@master-2 core]# netstat -plano | grep :9102
tcp6       0      0 :::9102                 :::*                    LISTEN      3543/kube-rbac-prox  off (0.00/0/0)
tcp6       0      0 10.20.186.120:9102      10.20.186.236:57840     ESTABLISHED 3543/kube-rbac-prox  keepalive (11.40/0/0)
tcp6       0      0 10.20.186.120:9102      10.20.186.124:45198     ESTABLISHED 3543/kube-rbac-prox  keepalive (1.17/0/0)

[root@master-2 core]# ps -ef | grep 3543
nfsnobo+    3543    3423  0 Apr19 ?        00:02:13 /usr/bin/kube-rbac-proxy --logtostderr --secure-listen-address=:9102 --tls-cipher-suites=TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256,TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256 --upstream=http://127.0.0.1:29102/ --tls-private-key-file=/etc/pki/tls/metrics-cert/tls.key --tls-cert-file=/etc/pki/tls/metrics-cert/tls.crt
root     3394891 3393742  0 17:39 pts/0    00:00:00 grep --color=auto 3543
[root@master-2 core]#

links to

netobserv/network-observability-operator#628: NETOBSERV-1619: use dedicated metrics ports for netobserv to avoid conflicts

openshift/openshift-tests-private#16185: netobserv-1596 and other tests fixes

mentioned on

Merge request - Updated 5 upstream sources

Assignee:: Mohamed Mahmoud

Reporter:: Aditya Honkalas

QA Contact:: Aditya Honkalas

Votes:: 0 Vote for this issue

Watchers:: 5 Start watching this issue

Created:: 2024/04/24 6:29 PM

Updated:: 2024/05/14 5:08 PM

Resolved:: 2024/05/14 5:07 PM

Details

Description

Attachments

Issue Links

Activity

People

Dates