-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
None
-
rhel-idm-ds
-
None
-
False
-
False
-
-
None
-
None
-
None
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
None
Goal
As an administrator whose server is completely unresponsive to LDAP queries due to thread pool saturation, I need a way to see real-time thread pool state without an LDAP connection, so I can diagnose the situation even when cn=monitor is unreachable.
cn=monitor metrics require a worker thread to serve the LDAP query. During full thread pool saturation a monitoring tool's query sits in the work queue behind everything else.
Create a MAP_SHARED mmap file slapd-<instance>.threadpool alongside the existing SNMP slapd.stats. Define a thread_pool_mmap_t struct with a version field, max_workers, heartbeat timestamp (CLOCK_MONOTONIC, updated by slapi_eq_repeat_rel callback), server PID, pool-level gauges (current/max work queue, current/max busy workers, ops initiated/completed, connection count), and per-worker slots (state, conn_id, op_id, start_ns).
Additionally, reserve space in each per-worker slot for backtrace fields (bt_captured, bt_frame_count, bt_timestamp_ns, bt_frames[64]) – they stay zeroed until a later ticket populates them; this avoids a future format migration.
Each worker writes only to its own slot using atomic stores – no cross-thread contention, no semaphores. Atomic reads are sufficient for diagnostics.
Open the file with O_NOFOLLOW | O_RDWR | O_CREAT, permissions 0640. No bind DNs or client IPs – just numeric IDs. Unlink on clean shutdown.
dsctl thread-pool-status opens the file read-only, reads the struct, and formats output: busy/max workers, queue depth, per-worker activity (state, conn_id, op_id, running duration). It warns if the heartbeat is older than 30 seconds. It checks that /proc/<pid>/comm is ns-slapd to guard against PID recycling after a crash.
The same per-worker data should also be exposed on cn=monitor as a multi-valued attribute (like the existing connection attribute) – one source of truth, two access paths. Add the new attribute to the ACI's targetattr != exclusion list alongside connection.
Acceptance criteria
- Verify dsctl <instance> thread-pool-status displays pool-level metrics (busy workers, queue depth, ops, connections) and per-worker activity without using any LDAP connection
- Verify the command works while the server is under full thread pool saturation (all workers busy, cn=monitor queries timing out)
- Verify the mmap file is created at startup with correct permissions (0640) and unlinked on clean shutdown
- Verify stale file detection works: dsctl warns when heartbeat is older than 30 seconds or PID does not match a running ns-slapd process
- Verify the mmap file is opened with O_NOFOLLOW and cannot be redirected via symlink
- Verify per-worker activity appears on cn=monitor as a multi-valued attribute
- Verify the new cn=monitor attribute is excluded from anonymous access via ACI
- Verify no measurable performance regression on the write path
- depends on
-
RHEL-153089 [RFE] Add work queue metrics to cn=monitor
-
- In Progress
-