Uploaded image for project: 'Red Hat Advanced Cluster Security'
  1. Red Hat Advanced Cluster Security
  2. ROX-31934

The collector pods stuck in CrashLoopBackOff in RHACS 4.8.4

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Minor Minor
    • None
    • 4.5.5
    • RHACS
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • Rox Sprint 4.10D

      USER PROBLEM
      The RHACS collector pods are stuck in the CrashLoopBackOff state.

      CONDITIONS

      [1] RHACS collector pods 

       

      $ oc get pods -A | grep collector
      
      rhacs-operator  collector-479q2  2/3     CrashLoopBackOff   4025 (4m41s ago)   14d 
      rhacs-operator  collector-gqv7d  2/3     CrashLoopBackOff   4021 (85s ago)     14d 
      rhacs-operator  collector-kxh6z  2/3     CrashLoopBackOff   4039 (3m28s ago)   14d 
      rhacs-operator  collector-rm7vm  2/3     CrashLoopBackOff   4026 (118s ago)    14d

       

       [2] Namespace Events:

       

      $ oc get events -n rhacs-operator 
      LAST SEEN TYPE REASON OBJECT MESSAGE 71s Normal Pulling pod/collector-479q2 Pulling image "registry.redhat.io/advanced-cluster-security/rhacs-collector-rhel8@sha256:f36004ced010b8fbb1e3241a4b09e158c74a70ddc74db4b0aa70ffc96df78182" 2s Warning BackOff pod/collector-479q2 Back-off restarting failed container collector in pod collector-479q2_rhacs-operator(40cd5ac5-cae2-4efd-91d8-737294af4e7b) 120m Normal Pulled pod/collector-479q2 (combined from similar events): Successfully pulled image "registry.redhat.io/advanced-cluster-security/rhacs-collector-rhel8@sha256:f36004ced010b8fbb1e3241a4b09e158c74a70ddc74db4b0aa70ffc96df78182" in 717ms (717ms including waiting). Image size: 128314781 bytes. 3m3s Normal Pulling pod/collector-gqv7d Pulling image "registry.redhat.io/advanced-cluster-security/rhacs-collector-rhel8@sha256:f36004ced010b8fbb1e3241a4b09e158c74a70ddc74db4b0aa70ffc96df78182" 12s Warning BackOff pod/collector-gqv7d Back-off restarting failed container collector in pod collector-gqv7d_rhacs-operator(17b83d63-665b-45ef-98c6-716355974bfb) 65m Normal Pulled pod/collector-gqv7d (combined from similar events): Successfully pulled image "registry.redhat.io/advanced-cluster-security/rhacs-collector-rhel8@sha256:f36004ced010b8fbb1e3241a4b09e158c74a70ddc74db4b0aa70ffc96df78182" in 551ms (551ms including waiting). Image size: 128314781 bytes. 5m5s Normal Pulling pod/collector-kxh6z Pulling image "registry.redhat.io/advanced-cluster-security/rhacs-collector-rhel8@sha256:f36004ced010b8fbb1e3241a4b09e158c74a70ddc74db4b0aa70ffc96df78182" 4m12s Warning BackOff pod/collector-kxh6z Back-off restarting failed container collector in pod collector-kxh6z_rhacs-operator(ecca9670-837f-4df3-b460-7a8a9f223c7b) 134m Normal Pulled pod/collector-kxh6z (combined from similar events): Successfully pulled image "registry.redhat.io/advanced-cluster-security/rhacs-collector-rhel8@sha256:f36004ced010b8fbb1e3241a4b09e158c74a70ddc74db4b0aa70ffc96df78182" in 596ms (596ms including waiting). Image size: 128314781 bytes. 3m35s Normal Pulling pod/collector-rm7vm Pulling image "registry.redhat.io/advanced-cluster-security/rhacs-collector-rhel8@sha256:f36004ced010b8fbb1e3241a4b09e158c74a70ddc74db4b0aa70ffc96df78182" 98s Warning BackOff pod/collector-rm7vm Back-off restarting failed container collector in pod collector-rm7vm_rhacs-operator(cb196cca-9681-4179-ae68-80456ce23c81) 96m Normal Pulled pod/collector-rm7vm (combined from similar events): Successfully pulled image "registry.redhat.io/advanced-cluster-security/rhacs-collector-rhel8@sha256:f36004ced010b8fbb1e3241a4b09e158c74a70ddc74db4b0aa70ffc96df78182" in 357ms (357ms including waiting). Image size: 128314781 bytes.

       

       

       [3] Pod Logs:

       

      $ oc -n rhacs-operator logs collector-xxx
      
      [WARNING 2025/11/07 03:39:08] libbpf: prog 'sys_exit': -- BEGIN PROG LOAD LOG --
      processed 270 insns (limit 1000000) max_states_per_insn 1 total_states 26 peak_states 26 mark_read 6
      -- END PROG LOAD LOG --
      [WARNING 2025/11/07 03:39:08] libbpf: prog 'sys_exit': failed to load: -22
      [WARNING 2025/11/07 03:39:08] libbpf: failed to load object 'bpf_probe'
      [WARNING 2025/11/07 03:39:08] libbpf: failed to load BPF skeleton 'bpf_probe': -22
      [ERROR   2025/11/07 03:39:08] libpman: failed to load BPF object (errno: 22 | message: Invalid argument)
      terminate called after throwing an instance of 'sinsp_exception'
        what():  Initialization issues during scap_init
      collector AbortHandler 0x8ae4d5 + 53
      /lib64/libc.so.6 (null) 0x7f6d172295b0 + 0
      /lib64/libc.so.6 gsignal 0x7f6d1722952f + 271
      /lib64/libc.so.6 abort 0x7f6d171fce65 + 295/lib64/libstdc++.so.6 (null) 0x7f6d17bdb09b + 0
      /lib64/libstdc++.so.6 (null) 0x7f6d17be154c + 0
      /lib64/libstdc++.so.6 (null) 0x7f6d17be15a7 + 0
      /lib64/libstdc++.so.6 (null) 0x7f6d17be1808 + 0
      collector collector::KernelDriverCOREEBPF::Setup(collector::CollectorConfig const&, sinsp&) 0x926984 + 1668
      collector collector::system_inspector::Service::InitKernel(collector::CollectorConfig const&) 0x923a08 + 72
      collector collector::SetupKernelDriver(collector::CollectorService&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&, collector::CollectorConfig const&) 0x8c0e3f + 1103
      collector main 0x898095 + 517
      /lib64/libc.so.6 __libc_start_main 0x7f6d172157e5 + 229
      collector _start 0x8aa0ae + 46
      Caught signal 6 (SIGABRT): Aborted
      /bootstrap.sh: line 85:     5 Aborted                 (core dumped) eval exec "$@"

       

      Based on the current collector pod logs, we can confirm that the collector is encountering a known issue related to recent kernel changes. This bug was addressed in RHACS Operator versions 4.5.6 and 4.6.1. Since my customer is already on version 4.8, which includes the fix, I would like to verify whether this issue is reoccurring in RHACS 4.8.

      It's related to the CVE-2024-50063 [1]

      As workaround there are the following options:

      The method for system-level data collection: The default value is CORE_BPF. Red Hat recommends using CORE_BPF for data collection. If you select NoCollection, Collector does not report any information about the network activity and the process executions. Available options are NoCollection and CORE_BPF. If you want to stop the collectors from crashlooping you can set the collection method to NO_COLLECTION. See [2].

      [1] - https://access.redhat.com/security/cve/cve-2024-50063 
      [2] - https://docs.redhat.com/en/documentation/red_hat_advanced_cluster_security_for_kubernetes/4.5/html/installing/installing-rhacs-on-red-hat-openshift#per-node-settings_install-secured-cluster-config-options-ocp

      We changed the SecuredCluster CR to NoCollection. However the collector pods are still on CrashLoopBackOff. The workaround doesn't work.{}

      ROOT CAUSE

       

      FIX

              rh-ee-ovalenti Olivier Valentin
              rhn-support-hthakare Harshal Thakare
              ACS Collector
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: