Uploaded image for project: 'RHEL'
  1. RHEL
  2. RHEL-6872

Silent failure in tuned scheduler plugin when IRQs cannot be affined

Linking RHIVOS CVEs to...Migration: Automation ...SWIFT: POC ConversionSync from "Extern...XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Won't Do
    • Icon: Undefined Undefined
    • None
    • rhel-8.6.0
    • tuned
    • None
    • Low
    • rhel-net-perf
    • ssg_core_services
    • None
    • False
    • False
    • Hide

      None

      Show
      None
    • None
    • None
    • None
    • None
    • If docs needed, set a value
    • None
    • 57,005

      Description of problem:

      When isolated_cores is specified in the tuned profile, the tuned scheduler plugin attempts to affine all IRQs to the non-isolated cores. However, if there are too many IRQs (not sure what the limit is), some IRQs will not be affined and will be left running on the isolated cores. This is a silent failure and is only seen when looking at the tuned logs. For example:

      2022-12-16 10:03:20,446 ERROR tuned.plugins.plugin_scheduler: Failed to set SMP affinity of IRQ 1031 to '00000003,00000003': [Errno 28] No space left on device
      2022-12-16 10:03:20,446 ERROR tuned.plugins.plugin_scheduler: Failed to set SMP affinity of IRQ 1032 to '00000003,00000003': [Errno 28] No space left on device
      2022-12-16 10:03:20,446 ERROR tuned.plugins.plugin_scheduler: Failed to set SMP affinity of IRQ 1033 to '00000003,00000003': [Errno 28] No space left on device
      2022-12-16 10:03:20,446 ERROR tuned.plugins.plugin_scheduler: Failed to set SMP affinity of IRQ 1034 to '00000003,00000003': [Errno 28] No space left on device

      I am wondering if this particular failure should cause the tuned profile to fail to apply, so the user is aware of the issue? In the case where some IRQs are not properly affined away from the isolated cores, this could potentially impact the latency of processes running on these isolated cores. The only way to notice the failure today is to look at the tuned logs.

      Version-Release number of selected component (if applicable):
      tuned-2.19.0-1.el8.noarch

      How reproducible:

      This will only be seen when:
      1. The isolated cores is set to use most of the cpus on a server.
      2. There is a significant number of IRQs. In my case this was due to my server having two Intel e810 NICs.

      Steps to Reproduce:
      1. Apply a tuned profile with isolated_cores set to use most of the cpus on the server, on a server with a large number of IRQs.

      Actual results:
      Some of the IRQs are not affined to the non-isolated cores, but the tuned profile still applies successfully.

      Expected results:
      Either all the IRQs are affined or the tuned profile fails to apply.

      Additional info: N/A

              jskarvad Jaroslav Škarvada
              bwensley@redhat.com Bart Wensley
              Jaroslav Škarvada Jaroslav Škarvada
              Robin Hack Robin Hack
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: