-
Bug
-
Resolution: Unresolved
-
Undefined
-
None
-
rhel-9.8
-
None
-
No
-
Moderate
-
rhel-kernel-rts-time
-
0
-
False
-
False
-
-
None
-
None
-
None
-
None
-
Unspecified
-
Unspecified
-
Unspecified
-
None
Problem Summary
stalld does not properly track task CPU migrations. When a monitored task moves from one CPU to another, the daemon's internal state becomes outdated. This leads to incorrect starvation detection, causing stalld to erroneously boost tasks that are no longer starving or fail to un-boost tasks that have moved.
Root Cause Analysis
The root cause is that stalld does not appear to handle the sched_migrate_task event. When this event occurs, the daemon should update its internal queues by removing the task from the old CPU's queue and adding it to the new target CPU's queue. This logic is currently missing.
Reproduction Scenarios
Scenario 1: Migrating the Starving (SCHED_NORMAL) Task
Steps to Reproduce:
- Run stalld on CPU 0, monitoring CPUs 1 and 2:
stalld -v -b queue_track -c 1,2 -a 0
- Create two CPU-bound tasks pinned to CPU 1 (note example PIDs):
taskset -c 1 bash -c 'while :; do :; done' & # SCHED_NORMAL task (e.g., PID=1000) taskset -c 1 chrt -f 40 bash -c 'while :; do :; done' & # SCHED_FIFO task (e.g., PID=1001)
- Migrate the SCHED_NORMAL task (PID 1000) to CPU 2:
taskset -p 04 1000
Expected Result:
Once the SCHED_NORMAL task is moved to CPU 2, it is no longer competing with the SCHED_FIFO task on CPU 1. Therefore, stalld should recognize the migration and stop boosting it.
Actual Result:
stalld fails to detect the migration and continues to incorrectly report the SCHED_NORMAL task as starving on its original CPU (CPU 1), boosting it erroneously.
stalld: found task: bash:1000 starving in CPU 1 single_threaded_main: checking cpu 1 - rt: 1 - starving: 1 stalld: cpu 1: pid: 1000 starving for 98 stalld: boosted pid 1000 (bash) (cpu 1) using SCHED_DEADLINE
Scenario 2: Migrating the High-Priority (SCHED_FIFO) Task
Steps to Reproduce:
- Follow the initial setup (steps 1-2) from Scenario 1.
- Migrate the SCHED_FIFO task (PID 1001) to CPU 2:
taskset -p 04 1001
Expected Result:
With the SCHED_FIFO task no longer on CPU 1, the SCHED_NORMAL task is no longer starving. stalld should detect this change and stop boosting the SCHED_NORMAL task.
Actual Result:
stalld fails to detect that the SCHED_FIFO task has moved. It continues to believe the SCHED_NORMAL task is starving and boosts it indefinitely.
stalld: cpu: 1 pid: 2714 ctx: 10 stalld: cpu: 1 pid: 2713 ctx: 182 R stalld: found task: bash:2714 starving in CPU 1 single_threaded_main: checking cpu 1 - rt: 1 - starving: 1 stalld: cpu 1: pid: 2714 starving for 361 stalld: boosted pid 2714 (bash) (cpu 1) using SCHED_DEADLINE.