Loading...

Linking RHIVOS CVEs to...

Migration: Automation ...

Sync from "Extern...

XML

Word

Printable

Type: Story
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: 389-ds-base
Labels:
None

Severity:
None
Epic Link:
IDM-2902
AssignedTeam:
rhel-idm-ds

Story Points:
None
Blocked:
False
Ready:
False
Blocked Reason:

Hide

None

Show
None
Product Documentation Required:
None
Sprint:
None

Preliminary Testing:
None
Test Coverage:
None

ProdDocsReview-CCS:
Unspecified
ProdDocsReview-Dev:
Unspecified
ProdDocsReview-QE:
Unspecified

Experience:

PX Impact Score:
SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Planning:
None

Goal

As a support engineer reviewing a sosreport from an environment without external monitoring (no PCP, no Grafana), I need periodic thread pool health snapshots in the error log so there is always a historical baseline of worker utilization – and I need the server to escalate severity automatically when sustained saturation is detected.

Add nsslapd-thread-pool-log-interval (seconds, 0 = disabled) and nsslapd-thread-stall-threshold (consecutive saturated checks before escalating, 0 = no escalation). Schedule a callback via slapi_eq_repeat_rel that logs a structured NOTICE with busy workers, queue depth, connection count, and ops initiated/completed.

When the check detects sustained saturation – all workers busy with growing queue for N consecutive intervals – escalate severity following the disk monitor pattern in daemon.c (NOTICE -> WARNING -> ALERT). Log an INFO entry when saturation resolves, including the duration. The callback does atomic loads only, completes in microseconds, and is safe in the event queue. If heavier diagnostics are needed later (iterating the per-thread activity array), that work should be offloaded to a short-lived thread to avoid blocking other event queue callbacks (replication retry, task cleanup, DB compaction).

When both this and the wtime threshold warning from RHEL-153090 are enabled during sustained saturation, the error log receives two independent streams. They're complementary – wtime focuses on per-operation impact, health summary on aggregate pool state – and use different rate limits, so the volume stays manageable. Documentation should clarify the purpose of each.

Acceptance criteria

Verify that setting nsslapd-thread-pool-log-interval: 10 produces a structured NOTICE in the error log every 10 seconds with busy workers, queue depth, connections, and ops counters
Verify setting the interval to 0 disables the summary entirely
Verify severity escalation: sustained saturation for N consecutive checks produces WARNING, then ALERT
Verify an INFO entry is logged when saturation resolves, including the duration of the saturation episode
Verify normal-load summaries remain at NOTICE level and do not escalate
Verify that with a 1-second interval under sustained load, the health summary NOTICEs appear at consistent ~1s intervals (no visible drift from the callback blocking the event queue)

Assignee:: IdM DS Dev

Reporter:: Simon Pichugin

Developer:: IdM DS Dev

QA Contact:: IdM DS QE

Doc Contact:: Evgenia Martyniuk

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2026/03/04 4:27 AM

Updated:: 2026/03/04 4:27 AM

Stale Date:: 2027/03/03

Details

Description

Goal

Acceptance criteria

Attachments

Easy Agile Planning Poker

Activity

People

Dates