Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-153095

[RFE] On-demand backtrace snapshot via dsctl

Linking RHIVOS CVEs to...Migration: Automation ...Sync from "Extern...XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • 389-ds-base
    • None
    • None
    • rhel-idm-ds
    • None
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Unspecified
    • Unspecified
    • Unspecified
    • None

      Goal

      As a support engineer who needs to see where threads are stuck in the code, I need a non-intrusive alternative to pstack that captures backtraces without freezing the process, so I can diagnose stuck threads on a production server under load.

      Per-worker activity from RHEL-153094 tells you what each thread is doing (SEARCH on conn 1234 for 8 seconds) but not where in the code it's stuck. Today the only answer is pstack or gdb, both of which freeze the entire process via ptrace.

      dsctl thread-pool-backtrace uses SIGUSR1 to trigger each worker to capture its own backtrace into its reserved mmap slot. No process freeze – threads are interrupted by the signal, capture the stack, and resume immediately.

      We need to repurpose the SIGUSR1 handler into a backtrace handler. The handler must distinguish between the initial process-level signal (start coordination) and per-worker signals (capture backtrace) – for example by checking whether the receiving thread is a worker thread or not.

      Acceptance criteria

      • Verify dsctl <instance> thread-pool-backtrace produces per-thread backtraces with resolved function names and file locations
      • Verify stuck threads (e.g., blocked in a plugin or on a mutex) are captured correctly – the backtrace shows the blocking call
      • Verify idle threads show the expected connection_wait_for_new_work call chain
      • Verify the process does not freeze during backtrace capture – concurrent operations continue without interruption
      • Verify rate limiting: a second call within 5 seconds returns the previous backtrace data instead of re-triggering capture
      • Verify PID validation: dsctl refuses to send the signal if /proc/<pid>/comm does not match ns-slapd
      • Verify workers that don't respond within the timeout are reported as "unresponsive" rather than causing dsctl to hang
      • Verify correct memory ordering: dsctl never reads incomplete frame data (acquire/release semantics on bt_captured)

              idm-ds-dev-bugs IdM DS Dev
              spichugi@redhat.com Simon Pichugin
              IdM DS Dev IdM DS Dev
              IdM DS QE IdM DS QE
              Evgenia Martyniuk Evgenia Martyniuk
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: