Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-8802

unnecessary kernel IPIs break through cpu isolation

    • None
    • Moderate
    • 7
    • rhel-sst-kernel-rts
    • ssg_core_kernel
    • 13
    • False
    • Hide

      None

      Show
      None
    • None
    • CK-May-2024, CK-June-2024, CK-July-2024, CK-August-2024, CK-September-2024, CK-October-2024, CK-November-2024
    • None
    • None
    • If docs needed, set a value
    • None

      Description of problem:

      When CPU isolation is used, such as via the cpu-partitioning tuned profile, it is possible for isolated CPUs to be interrupted via kernel IPIs initiated by non-isolated CPUs. There are many different ways that this can happen but a few have been diagnosed using the rt-trace-bpf tool:

      caused by NetworkManager:
      64359.052209596    NetworkManager       0    1405     smp_call_function_many_cond (cpu=0, func=do_kernel_range_flush)
              smp_call_function_many_cond+0x1
              smp_call_function+0x39
              on_each_cpu+0x2a
              flush_tlb_kernel_range+0x7b
              __purge_vmap_area_lazy+0x70
              _vm_unmap_aliases.part.42+0xdf
              change_page_attr_set_clr+0x16a
              set_memory_ro+0x26
              bpf_int_jit_compile+0x2f9
              bpf_prog_select_runtime+0xc6
              bpf_prepare_filter+0x523
              sk_attach_filter+0x13
              sock_setsockopt+0x92c
              __sys_setsockopt+0x16a
              __x64_sys_setsockopt+0x20
              do_syscall_64+0x87
              entry_SYSCALL_64_after_hwframe+0x65

      caused by the mgag200 kernel module:
      238903.096535737 kworker/0:1 0 88579 smp_call_function_many_cond (cpu=0, func=do_flush_tlb_all)
      smp_call_function_many_cond+0x1
      smp_call_function+0x39
      on_each_cpu+0x2a
      flush_tlb_kernel_range+0x48
      __purge_vmap_area_lazy+0x70
      free_vmap_area_noflush+0xf2
      remove_vm_area+0x93
      __vunmap+0x59
      drm_gem_shmem_vunmap+0x6d
      mgag200_handle_damage+0x62
      mgag200_simple_display_pipe_update+0x69
      drm_atomic_helper_commit_planes+0xb3
      drm_atomic_helper_commit_tail+0x26
      commit_tail+0xc6
      drm_atomic_helper_commit+0x103
      drm_atomic_helper_dirtyfb+0x20e
      drm_fb_helper_damage_work+0x228
      process_one_work+0x18f
      worker_thread+0x30
      kthread+0x15d
      ret_from_fork+0x1f

      Tracing on the isolated CPUs shows preemptions such as this:

      58118.769286 | 18) <...>-128143 | | smp_call_function_interrupt() {
      58118.769286 | 18) <...>-128143 | | irq_enter()

      { 58118.769287 | 18) <...>-128143 | 0.101 us | preempt_count_add(); 58118.769288 | 18) <...>-128143 | 0.968 us | }

      58118.769288 | 18) <...>-128143 | | generic_smp_call_function_single_interrupt() {
      58118.769289 | 18) <...>-128143 | | flush_smp_call_function_queue() {
      58118.769289 | 18) <...>-128143 | | do_flush_tlb_all()

      { 58118.769290 | 18) <...>-128143 | 0.453 us | native_flush_tlb_global(); 58118.769291 | 18) <...>-128143 | 1.439 us | }

      58118.769292 | 18) <...>-128143 | 2.402 us | }
      58118.769292 | 18) <...>-128143 | 3.223 us | }
      58118.769292 | 18) <...>-128143 | | irq_exit() {
      58118.769293 | 18) <...>-128143 | 0.077 us | preempt_count_sub();
      58118.769294 | 18) <...>-128143 | 0.201 us | idle_cpu();
      58118.769295 | 18) <...>-128143 | | tick_nohz_irq_exit() {
      58118.769295 | 18) <...>-128143 | 0.164 us | ktime_get();
      58118.769296 | 18) <...>-128143 | | __tick_nohz_full_update_tick()

      { 58118.769296 | 18) <...>-128143 | 0.079 us | check_tick_dependency(); 58118.769297 | 18) <...>-128143 | 0.074 us | check_tick_dependency(); 58118.769298 | 18) <...>-128143 | 0.070 us | check_tick_dependency(); 58118.769299 | 18) <...>-128143 | 0.101 us | check_tick_dependency(); 58118.769300 | 18) <...>-128143 | 1.458 us | tick_nohz_next_event(); 58118.769302 | 18) <...>-128143 | 0.082 us | tick_nohz_stop_tick(); 58118.769303 | 18) <...>-128143 | 6.229 us | }

      58118.769303 | 18) <...>-128143 | 8.124 us | }
      58118.769303 | 18) <...>-128143 | + 10.872 us | }
      58118.769304 | 18) <...>-128143 | + 17.471 us | }

      Version-Release number of selected component (if applicable):

      4.18.0-348.12.2.rt7.143.el8_5.x86_64

      How reproducible:

      Easily

      Steps to Reproduce:
      1. Boot the system using an RT kernel and the cpu-partitioning tuned profile
      2. Run a workload that measures latency, such as oslat, on the isolated CPUs
      3. Trace the kernel activity on the isolated CPUs while the workload is running

      Actual results:

      Latency spikes caused by IPI processing will be observed on the isolated CPUs when there is no need to handle the IPI at that moment.

      Expected results:

      No needless IPI processing should occur on the isolated CPUs – for example, for a 100% userspace workload such as oslat there is no need to enter the kernel and service the IPI until a necessary kernel entry occurs (ie. system call, timer interrupt, etc.).

      Additional info:

              rh-ee-vschneid Valentin Schneider
              krister@redhat.com Karl Rister (Inactive)
              Valentin Schneider Valentin Schneider
              Qiao Zhao Qiao Zhao
              Votes:
              0 Vote for this issue
              Watchers:
              14 Start watching this issue

                Created:
                Updated: