Loading...

XML

Word

Printable

Details

Type: Bug
Resolution: Not a Bug
Priority: Undefined
Fix Version/s: None
Affects Version/s: 4.14
Component/s: Performance Addon Operator
Labels:
None

Regression:
No
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Latest Status Summary:
2024-04-17: Investigation is ongoing

SFDC Cases Counter:
SFDC Cases Links:

Description

Description of problem:

I have a testpmd pod running on an isolated core on a system that has workload partitioning enabled with CPU3 being part of the isolated cores.
The packet forwarding thread (rte-worker-3) on the tespmd process (pid 2570750) is running on CPU3 and its threadID is 2570754.  It is running as a busy loop with scheduling policy: SCHED_FIFO and a scheduling priority: 1 so it should not be interrupted on the isolated CPU3.
However, running function_graph trace on CPU3 shows us that the testpmd forwarding thread has been interrupted multiple times by the irq_wor-46 process. 

3)   0.419 us    |          save_fpregs_to_fpstate();
------------------------------------------  
3) rte-wor-2570754 =>   irq_wor-46   
------------------------------------------    
3)               |          finish_task_switch.isra.0() {  
3)               |            vtime_task_switch_generic() {


Here is the scheduling stats for that thread:

#### /proc/2570750/task/2570754/sched 
rte-worker-3 (2570754, #threads: 8)
-------------------------------------------------------------------
se.exec_start                                :     551543727.283270
se.vruntime                                  :             0.000000
se.sum_exec_runtime                          :       7103349.966615
se.nr_migrations                             :                    1
nr_switches                                  :                   75
nr_voluntary_switches                        :                    2
nr_involuntary_switches                      :                   73
se.load.weight                               :              1048576
se.avg.load_sum                              :                47295
se.avg.runnable_sum                          :             48430080
se.avg.util_sum                              :             48430080
se.avg.load_avg                              :                 1024
se.avg.runnable_avg                          :                 1024
se.avg.util_avg                              :                 1024
se.avg.last_update_time                      :      544440981193728
se.avg.util_est.ewma                         :                    1
se.avg.util_est.enqueued                     :                    1
policy                                       :                    1
prio                                         :                   98
clock-delta                                  :                   43
#### 

We can see that nr_involuntary_switches is 73.

The irq_work/CPUn thread looks like it has been introduced in this upstream patch [1] and in this rhel9 patch [2].

[1]: https://github.com/torvalds/linux/commit/b4c6f86ec2f648b5e6d4b04564fbc6d5351160a8
[2]: https://gitlab.com/redhat/rhel/src/kernel/rhel-9/-/commit/62014d41db107099b22b77b5eb0011d5ba07df1b

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:

    1. Deploy an SNO cluster with DU profile
    2. Run a testpmd pod
    3.

Actual results:

Expected results:

Additional info:

Attachments

Activity

People

Assignee:: Martin Sivak

Reporter:: Dahir Osman

QA Contact:: Gowrishankar Rajaiyan

Votes:: 0 Vote for this issue

Watchers:: 7 Start watching this issue

Dates

Created:: 2024/04/15 5:53 PM

Updated:: 2024/04/25 2:19 PM

Resolved:: 2024/04/25 2:19 PM