-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
rhel-8.5.0
-
None
-
Moderate
-
7
-
rhel-sst-kernel-rts
-
ssg_core_kernel
-
13
-
False
-
-
None
-
CK-May-2024, CK-June-2024, CK-July-2024, CK-August-2024, CK-September-2024, CK-October-2024, CK-November-2024
-
None
-
None
-
If docs needed, set a value
-
-
Unspecified
-
None
Description of problem:
When CPU isolation is used, such as via the cpu-partitioning tuned profile, it is possible for isolated CPUs to be interrupted via kernel IPIs initiated by non-isolated CPUs. There are many different ways that this can happen but a few have been diagnosed using the rt-trace-bpf tool:
caused by NetworkManager:
64359.052209596 NetworkManager 0 1405 smp_call_function_many_cond (cpu=0, func=do_kernel_range_flush)
smp_call_function_many_cond+0x1
smp_call_function+0x39
on_each_cpu+0x2a
flush_tlb_kernel_range+0x7b
__purge_vmap_area_lazy+0x70
_vm_unmap_aliases.part.42+0xdf
change_page_attr_set_clr+0x16a
set_memory_ro+0x26
bpf_int_jit_compile+0x2f9
bpf_prog_select_runtime+0xc6
bpf_prepare_filter+0x523
sk_attach_filter+0x13
sock_setsockopt+0x92c
__sys_setsockopt+0x16a
__x64_sys_setsockopt+0x20
do_syscall_64+0x87
entry_SYSCALL_64_after_hwframe+0x65
caused by the mgag200 kernel module:
238903.096535737 kworker/0:1 0 88579 smp_call_function_many_cond (cpu=0, func=do_flush_tlb_all)
smp_call_function_many_cond+0x1
smp_call_function+0x39
on_each_cpu+0x2a
flush_tlb_kernel_range+0x48
__purge_vmap_area_lazy+0x70
free_vmap_area_noflush+0xf2
remove_vm_area+0x93
__vunmap+0x59
drm_gem_shmem_vunmap+0x6d
mgag200_handle_damage+0x62
mgag200_simple_display_pipe_update+0x69
drm_atomic_helper_commit_planes+0xb3
drm_atomic_helper_commit_tail+0x26
commit_tail+0xc6
drm_atomic_helper_commit+0x103
drm_atomic_helper_dirtyfb+0x20e
drm_fb_helper_damage_work+0x228
process_one_work+0x18f
worker_thread+0x30
kthread+0x15d
ret_from_fork+0x1f
Tracing on the isolated CPUs shows preemptions such as this:
58118.769286 | 18) <...>-128143 | | smp_call_function_interrupt() {
58118.769286 | 18) <...>-128143 | | irq_enter()
58118.769288 | 18) <...>-128143 | | generic_smp_call_function_single_interrupt() {
58118.769289 | 18) <...>-128143 | | flush_smp_call_function_queue() {
58118.769289 | 18) <...>-128143 | | do_flush_tlb_all()
58118.769292 | 18) <...>-128143 | 2.402 us | }
58118.769292 | 18) <...>-128143 | 3.223 us | }
58118.769292 | 18) <...>-128143 | | irq_exit() {
58118.769293 | 18) <...>-128143 | 0.077 us | preempt_count_sub();
58118.769294 | 18) <...>-128143 | 0.201 us | idle_cpu();
58118.769295 | 18) <...>-128143 | | tick_nohz_irq_exit() {
58118.769295 | 18) <...>-128143 | 0.164 us | ktime_get();
58118.769296 | 18) <...>-128143 | | __tick_nohz_full_update_tick()
58118.769303 | 18) <...>-128143 | 8.124 us | }
58118.769303 | 18) <...>-128143 | + 10.872 us | }
58118.769304 | 18) <...>-128143 | + 17.471 us | }
Version-Release number of selected component (if applicable):
4.18.0-348.12.2.rt7.143.el8_5.x86_64
How reproducible:
Easily
Steps to Reproduce:
1. Boot the system using an RT kernel and the cpu-partitioning tuned profile
2. Run a workload that measures latency, such as oslat, on the isolated CPUs
3. Trace the kernel activity on the isolated CPUs while the workload is running
Actual results:
Latency spikes caused by IPI processing will be observed on the isolated CPUs when there is no need to handle the IPI at that moment.
Expected results:
No needless IPI processing should occur on the isolated CPUs – for example, for a 100% userspace workload such as oslat there is no need to enter the kernel and service the IPI until a necessary kernel entry occurs (ie. system call, timer interrupt, etc.).
Additional info:
- external trackers