-
Bug
-
Resolution: Unresolved
-
Minor
-
None
-
4.5.5
-
None
-
False
-
-
False
-
-
-
-
-
Rox Sprint 4.10D
USER PROBLEM
The RHACS collector pods are stuck in the CrashLoopBackOff state.
CONDITIONS
[1] RHACS collector pods
$ oc get pods -A | grep collector rhacs-operator collector-479q2 2/3 CrashLoopBackOff 4025 (4m41s ago) 14d rhacs-operator collector-gqv7d 2/3 CrashLoopBackOff 4021 (85s ago) 14d rhacs-operator collector-kxh6z 2/3 CrashLoopBackOff 4039 (3m28s ago) 14d rhacs-operator collector-rm7vm 2/3 CrashLoopBackOff 4026 (118s ago) 14d
[2] Namespace Events:
$ oc get events -n rhacs-operator LAST SEEN TYPE REASON OBJECT MESSAGE 71s Normal Pulling pod/collector-479q2 Pulling image "registry.redhat.io/advanced-cluster-security/rhacs-collector-rhel8@sha256:f36004ced010b8fbb1e3241a4b09e158c74a70ddc74db4b0aa70ffc96df78182" 2s Warning BackOff pod/collector-479q2 Back-off restarting failed container collector in pod collector-479q2_rhacs-operator(40cd5ac5-cae2-4efd-91d8-737294af4e7b) 120m Normal Pulled pod/collector-479q2 (combined from similar events): Successfully pulled image "registry.redhat.io/advanced-cluster-security/rhacs-collector-rhel8@sha256:f36004ced010b8fbb1e3241a4b09e158c74a70ddc74db4b0aa70ffc96df78182" in 717ms (717ms including waiting). Image size: 128314781 bytes. 3m3s Normal Pulling pod/collector-gqv7d Pulling image "registry.redhat.io/advanced-cluster-security/rhacs-collector-rhel8@sha256:f36004ced010b8fbb1e3241a4b09e158c74a70ddc74db4b0aa70ffc96df78182" 12s Warning BackOff pod/collector-gqv7d Back-off restarting failed container collector in pod collector-gqv7d_rhacs-operator(17b83d63-665b-45ef-98c6-716355974bfb) 65m Normal Pulled pod/collector-gqv7d (combined from similar events): Successfully pulled image "registry.redhat.io/advanced-cluster-security/rhacs-collector-rhel8@sha256:f36004ced010b8fbb1e3241a4b09e158c74a70ddc74db4b0aa70ffc96df78182" in 551ms (551ms including waiting). Image size: 128314781 bytes. 5m5s Normal Pulling pod/collector-kxh6z Pulling image "registry.redhat.io/advanced-cluster-security/rhacs-collector-rhel8@sha256:f36004ced010b8fbb1e3241a4b09e158c74a70ddc74db4b0aa70ffc96df78182" 4m12s Warning BackOff pod/collector-kxh6z Back-off restarting failed container collector in pod collector-kxh6z_rhacs-operator(ecca9670-837f-4df3-b460-7a8a9f223c7b) 134m Normal Pulled pod/collector-kxh6z (combined from similar events): Successfully pulled image "registry.redhat.io/advanced-cluster-security/rhacs-collector-rhel8@sha256:f36004ced010b8fbb1e3241a4b09e158c74a70ddc74db4b0aa70ffc96df78182" in 596ms (596ms including waiting). Image size: 128314781 bytes. 3m35s Normal Pulling pod/collector-rm7vm Pulling image "registry.redhat.io/advanced-cluster-security/rhacs-collector-rhel8@sha256:f36004ced010b8fbb1e3241a4b09e158c74a70ddc74db4b0aa70ffc96df78182" 98s Warning BackOff pod/collector-rm7vm Back-off restarting failed container collector in pod collector-rm7vm_rhacs-operator(cb196cca-9681-4179-ae68-80456ce23c81) 96m Normal Pulled pod/collector-rm7vm (combined from similar events): Successfully pulled image "registry.redhat.io/advanced-cluster-security/rhacs-collector-rhel8@sha256:f36004ced010b8fbb1e3241a4b09e158c74a70ddc74db4b0aa70ffc96df78182" in 357ms (357ms including waiting). Image size: 128314781 bytes.
[3] Pod Logs:
$ oc -n rhacs-operator logs collector-xxx [WARNING 2025/11/07 03:39:08] libbpf: prog 'sys_exit': -- BEGIN PROG LOAD LOG -- processed 270 insns (limit 1000000) max_states_per_insn 1 total_states 26 peak_states 26 mark_read 6 -- END PROG LOAD LOG -- [WARNING 2025/11/07 03:39:08] libbpf: prog 'sys_exit': failed to load: -22 [WARNING 2025/11/07 03:39:08] libbpf: failed to load object 'bpf_probe' [WARNING 2025/11/07 03:39:08] libbpf: failed to load BPF skeleton 'bpf_probe': -22 [ERROR 2025/11/07 03:39:08] libpman: failed to load BPF object (errno: 22 | message: Invalid argument) terminate called after throwing an instance of 'sinsp_exception' what(): Initialization issues during scap_init collector AbortHandler 0x8ae4d5 + 53 /lib64/libc.so.6 (null) 0x7f6d172295b0 + 0 /lib64/libc.so.6 gsignal 0x7f6d1722952f + 271 /lib64/libc.so.6 abort 0x7f6d171fce65 + 295/lib64/libstdc++.so.6 (null) 0x7f6d17bdb09b + 0 /lib64/libstdc++.so.6 (null) 0x7f6d17be154c + 0 /lib64/libstdc++.so.6 (null) 0x7f6d17be15a7 + 0 /lib64/libstdc++.so.6 (null) 0x7f6d17be1808 + 0 collector collector::KernelDriverCOREEBPF::Setup(collector::CollectorConfig const&, sinsp&) 0x926984 + 1668 collector collector::system_inspector::Service::InitKernel(collector::CollectorConfig const&) 0x923a08 + 72 collector collector::SetupKernelDriver(collector::CollectorService&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, collector::CollectorConfig const&) 0x8c0e3f + 1103 collector main 0x898095 + 517 /lib64/libc.so.6 __libc_start_main 0x7f6d172157e5 + 229 collector _start 0x8aa0ae + 46 Caught signal 6 (SIGABRT): Aborted /bootstrap.sh: line 85: 5 Aborted (core dumped) eval exec "$@"
Based on the current collector pod logs, we can confirm that the collector is encountering a known issue related to recent kernel changes. This bug was addressed in RHACS Operator versions 4.5.6 and 4.6.1. Since my customer is already on version 4.8, which includes the fix, I would like to verify whether this issue is reoccurring in RHACS 4.8.
It's related to the CVE-2024-50063 [1]
As workaround there are the following options:
The method for system-level data collection: The default value is CORE_BPF. Red Hat recommends using CORE_BPF for data collection. If you select NoCollection, Collector does not report any information about the network activity and the process executions. Available options are NoCollection and CORE_BPF. If you want to stop the collectors from crashlooping you can set the collection method to NO_COLLECTION. See [2].
[1] - https://access.redhat.com/security/cve/cve-2024-50063
[2] - https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_security_for_kubernetes/4.5/html/installing/installing-rhacs-on-red-hat-openshift#per-node-settings_install-secured-cluster-config-options-ocp
We changed the SecuredCluster CR to NoCollection. However the collector pods are still on CrashLoopBackOff. The workaround doesn't work.{}
ROOT CAUSE
FIX