Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-114506

stalld: Incorrect starvation state after DL server is disabled and re-enabled

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • rhel-10.1
    • stalld
    • None
    • No
    • Important
    • 1
    • rhel-kernel-rts-time
    • 0
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • CK Parent Issues In Progress
    • None
    • None
    • Unspecified
    • Unspecified
    • Unspecified
    • All
    • None

      Problem Summary

      When the DL server is disabled and then re-enabled on a monitored CPU, stalld fails to update its internal state correctly. It continues to report tasks as starving even after the DL server is active again and should be handling the workload, leading to persistent false-positive starvation reports.

      Steps to Reproduce

      1. Start stalld on CPU 0 to monitor CPU 1:
        stalld -v -b queue_track -c 1 -a 0 
      1. Start two CPU-bound tasks on CPU 1: one SCHED_NORMAL (e.g., PID 9368) and one SCHED_FIFO (e.g., PID 9369).
        taskset -c 1 bash -c 'while :; do :; done' &
        taskset -c 1 chrt -f 40 bash -c 'while :; do :; done' & 
      1. Observe Initial State: stalld correctly detects that the SCHED_NORMAL task (9368) is starving.
        stalld: found task: bash:9368 ready to run in CPU 1 single_threaded_main: checking cpu 1 - rt: 1 - starving: 2 
      1. Disable the DL server on CPU 1:
        echo 0 > /sys/kernel/debug/sched/fair_server/cpu1/runtime
      1. Observe State with DL Disabled: As expected, stalld starts to report task 9368 as starving.
        stalld: cpu: 1 pid: 9368 ctx: 5459 R 
        stalld: cpu: 1 pid: 9369 ctx: 5578
        stalld: found task: bash:9368 starving in CPU 1
        single_threaded_main: checking cpu 1 - rt: 1 - starving: 1
        stalld:          cpu 1: pid: 9369 starving for 10 
      1. Re-enable the DL server on CPU 1:
        echo 50000000 > /sys/kernel/debug/sched/fair_server/cpu1/runtime 

      Expected Result

      After the DL server is re-enabled, it resumes scheduling the starving tasks. stalld should detect this change and stop reporting task 9368 as starving.

      Actual Result

      stalld fails to recognize that the DL server is active again. It gets stuck in its previous state and continues to incorrectly report the SCHED_NORMAL task (9368) as starving indefinitely.

              wandercosta Wander Costa
              wandercosta Wander Costa
              Wander Costa Wander Costa
              Chang Yin Chang Yin
              Votes:
              0 Vote for this issue
              Watchers:
              6 Start watching this issue

                Created:
                Updated: