Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-153090

[RFE] Thread pool pressure events in error log

Linking RHIVOS CVEs to...Migration: Automation ...Sync from "Extern...XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • 389-ds-base
    • None
    • None
    • rhel-idm-ds
    • None
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • Unspecified
    • Unspecified
    • Unspecified
    • None

      Goal

      As a support engineer investigating a past performance incident with only logs available, I need the server to automatically record thread pool pressure events in the error log so there is historical evidence of saturation episodes without relying on external monitoring or lucky timing.

      Two kinds of events that support engineers need as historical evidence are silently dropped today.

      wtime threshold. The access log records per-operation wtime (queue wait time), but the server never reacts to abnormal values. Add nsslapd-wtime-warning-threshold (seconds, 0 = disabled). When an operation's wtime exceeds the threshold, increment a counter. Emit a rate-limited NOTICE at most once per interval (e.g., 60 seconds) summarizing how many operations crossed the threshold, along with current queue depth and busy worker count. Per-operation logging must be avoided because under sustained load it creates a feedback loop where log I/O itself increases queue wait times.

      maxthreadsperconn detail. When a connection first hits the per-connection thread limit, emit a NOTICE with conn ID, fd, active thread count, and blocked op count. Use the existing guard pattern that logs only when c_maxthreadsblocked == 1 so a single connection doesn't flood the log.
      Add maxthreadsperconnlasttime to cn=monitor so quick triage can answer "when did this last happen" without parsing logs. A bare timestamp on cn=monitor is safe since it carries no identifying information.

      The detailed events (conn ID, fd, blocked ops) go to the error log, not cn=monitor. cn=monitor is publicly readable by anonymous users (default ACI excludes only aci and connection). Connection details must stay in the file-permission-protected error log (default 0600).

      Acceptance criteria

      • Verify that setting nsslapd-wtime-warning-threshold: 2 causes a rate-limited NOTICE in the error log when operations queue for longer than 2 seconds
      • Verify the NOTICE includes the count of threshold-crossing operations, queue depth, and busy worker count
      • Verify at most one NOTICE is emitted per interval (no per-operation flooding under sustained load)
      • Verify setting the threshold to 0 disables the warning entirely
      • Verify a NOTICE is logged on first maxthreadsperconn hit per connection, including conn ID and blocked op count
      • Verify repeated hits on the same connection do not produce additional log entries
      • Verify maxthreadsperconnlasttime appears on cn=monitor with a valid timestamp after a hit

              idm-ds-dev-bugs IdM DS Dev
              spichugi@redhat.com Simon Pichugin
              IdM DS Dev IdM DS Dev
              IdM DS QE IdM DS QE
              Evgenia Martyniuk Evgenia Martyniuk
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: