Details
-
Bug
-
Resolution: Not a Bug
-
Undefined
-
None
-
4.14
-
None
-
No
-
False
-
-
2024-04-17: Investigation is ongoing
Description
Description of problem:
I have a testpmd pod running on an isolated core on a system that has workload partitioning enabled with CPU3 being part of the isolated cores. The packet forwarding thread (rte-worker-3) on the tespmd process (pid 2570750) is running on CPU3 and its threadID is 2570754. It is running as a busy loop with scheduling policy: SCHED_FIFO and a scheduling priority: 1 so it should not be interrupted on the isolated CPU3. However, running function_graph trace on CPU3 shows us that the testpmd forwarding thread has been interrupted multiple times by the irq_wor-46 process. 3) 0.419 us | save_fpregs_to_fpstate(); ------------------------------------------ 3) rte-wor-2570754 => irq_wor-46 ------------------------------------------ 3) | finish_task_switch.isra.0() { 3) | vtime_task_switch_generic() { Here is the scheduling stats for that thread: #### /proc/2570750/task/2570754/sched rte-worker-3 (2570754, #threads: 8) ------------------------------------------------------------------- se.exec_start : 551543727.283270 se.vruntime : 0.000000 se.sum_exec_runtime : 7103349.966615 se.nr_migrations : 1 nr_switches : 75 nr_voluntary_switches : 2 nr_involuntary_switches : 73 se.load.weight : 1048576 se.avg.load_sum : 47295 se.avg.runnable_sum : 48430080 se.avg.util_sum : 48430080 se.avg.load_avg : 1024 se.avg.runnable_avg : 1024 se.avg.util_avg : 1024 se.avg.last_update_time : 544440981193728 se.avg.util_est.ewma : 1 se.avg.util_est.enqueued : 1 policy : 1 prio : 98 clock-delta : 43 #### We can see that nr_involuntary_switches is 73. The irq_work/CPUn thread looks like it has been introduced in this upstream patch [1] and in this rhel9 patch [2]. [1]: https://github.com/torvalds/linux/commit/b4c6f86ec2f648b5e6d4b04564fbc6d5351160a8 [2]: https://gitlab.com/redhat/rhel/src/kernel/rhel-9/-/commit/62014d41db107099b22b77b5eb0011d5ba07df1b
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. Deploy an SNO cluster with DU profile 2. Run a testpmd pod 3.
Actual results:
Expected results:
Additional info: