Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-111528

stalld: Event-driven BPF backend fails to detect pre-existing starvation

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • rhel-9.8
    • stalld
    • None
    • No
    • Moderate
    • 1
    • rhel-kernel-rts-time
    • 0
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • CK Parent Issues In Progress
    • None
    • None
    • Unspecified
    • Unspecified
    • Unspecified
    • None

      Problem Summary

      When stalld is started monitoring a CPU where a CPU-bound SCHED_FIFO process is already running, it fails to detect any pre-existing starving tasks. The daemon starts correctly but remains unaware of the starvation condition, rendering it ineffective in this scenario.

      Root Cause Analysis

      The issue stems from the event-driven nature of the BPF backend. A continuously running SCHED_FIFO process does not yield the CPU and therefore generates no scheduling events (such as context switches).

      Since the BPF backend relies on these events to trigger status updates, the absence of events leaves stalld blind to the initial state of the CPU, including the starving SCHED_NORMAL process.

      Steps to Reproduce

      1. In a terminal, start two CPU-intensive processes pinned to CPU 1. One should be a standard process, and the other a SCHED_FIFO process: 
        taskset -c 1 bash -c 'while :; do :; done' &
        taskset -c 1 chrt -f 40 bash -c 'while :; do :; done' & 
      1. After the processes are running, start stalld to monitor CPU 1:
        stalld -v -b queue_track -c 1 -a 0 

      Expected Result

      stalld should have a mechanism to detect starvation at startup, even in the absence of new scheduling events. It should identify that the SCHED_NORMAL process is being starved and report the condition.

      Actual Results

      stalld starts but remains silent. It does not report the starving SCHED_NORMAL process because the BPF backend never receives an event to trigger an update.

              wandercosta Wander Costa
              wandercosta Wander Costa
              Wander Costa Wander Costa
              Chang Yin Chang Yin
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: