Uploaded image for project: 'Network Observability'
  1. Network Observability
  2. NETOBSERV-1808

When using realtime kernel version the agent pods gets stuck in Crashloopbackoff state

    • False
    • None
    • False
    • Hide
      with RHEL 9.2 Realtime kernel "RT" some hooks will fail to pass verifier check because rhel9.2 is missing
      https://bugzilla.redhat.com/show_bug.cgi?id=2166911
      rhel9.4 has the fix.
      This fix will prevent ebpf agent from crashing and ignore those features with log message
      Show
      with RHEL 9.2 Realtime kernel "RT" some hooks will fail to pass verifier check because rhel9.2 is missing https://bugzilla.redhat.com/show_bug.cgi?id=2166911 rhel9.4 has the fix. This fix will prevent ebpf agent from crashing and ignore those features with log message
    • NetObserv - Sprint 258, NetObserv - Sprint 259

      As per the discussion on slack thread: https://redhat-internal.slack.com/archives/C01DMAD88JF/p1724051085870459

      Discussion:

      • pkt drop hook need special handling for rt kernel. 
      • netobserv intentionally uses BPF_F_NO_PREALLOC for hashmap to be conservative with memory usage the error from rt nodes seems they didn't like that and wanted to memory to be all preallocated.
      • team fixed this issue in rhel9.4 via https://bugzilla.redhat.com/show_bug.cgi?id=2166911 but it wasn't backported to rhel9.2 and fix is there OCP 4.16.
      • We checked with RHEL team for fix: https://issues.redhat.com/browse/RHEL-55713 but that was closed with won't do as team currently don't have the capacity to do the backport.
      • On thread team suggested to open a Jira here for further work and changes.

       

      2024-08-15T04:41:17.799344034Z time="2024-08-15T04:41:17Z" level=fatal msg="can't instantiate NetObserv eBPF Agent" error="loading and assigning BPF objects: field KfreeSkb: program kfree_skb: load program: invalid argument: trace type programs can only use preallocated hash map (1 line(s) omitted)" 

      nodes are running realtime kernel version 5.14.0-284.73.1.rt14.358.el9_2.x86_64 while the working nodes are nxts1-osma-001/002/003  and those nodes are running different kernel 5.14.0-284.73.1.el9_2.x86_64

            mmahmoud@redhat.com Mohamed Mahmoud
            rhn-support-chsharma Chandra Shekhar Sharma
            Amogh Rameshappa Devapura Amogh Rameshappa Devapura
            Sara Thomas Sara Thomas
            Votes:
            0 Vote for this issue
            Watchers:
            6 Start watching this issue

              Created:
              Updated:
              Resolved: