Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-31936

tuned: tuned breaks dynamic IRQ affinity

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Undefined Undefined
    • None
    • 4.15
    • None
    • +
    • Important
    • No
    • False
    • Hide

      None

      Show
      None
    • Hide
      * Previously, both tuned and irqbalanced were modifying the Interrupt Request (IRQ) CPU affinity which caused issues. With this release, only irqbalanced is in charge of configuring interrupt affinity. (link:https://issues.redhat.com/browse/OCPBUGS-31936[*OCPBUGS-31936*])
      ___________________
      Cause:

      Both tuned and irqbalanced were modifying the irq cpu affinity stomping on each other feet.

      Irqbalanced configuration was updated properly when cpu pinned workload was started, however a late starting NTO/tuned undid the mask and reverted it. Causing issues.
      Issue:

      After SNO reboot some interrupts suddenly started interfering with cpu pinned workloads.

      Fix:

      Only one component is in charge of configuring interrupt affinity now (irqbalanced).
      Show
      * Previously, both tuned and irqbalanced were modifying the Interrupt Request (IRQ) CPU affinity which caused issues. With this release, only irqbalanced is in charge of configuring interrupt affinity. (link: https://issues.redhat.com/browse/OCPBUGS-31936 [* OCPBUGS-31936 *]) ___________________ Cause: Both tuned and irqbalanced were modifying the irq cpu affinity stomping on each other feet. Irqbalanced configuration was updated properly when cpu pinned workload was started, however a late starting NTO/tuned undid the mask and reverted it. Causing issues. Issue: After SNO reboot some interrupts suddenly started interfering with cpu pinned workloads. Fix: Only one component is in charge of configuring interrupt affinity now (irqbalanced).
    • Bug Fix
    • In Progress

      This is a clone of issue OCPBUGS-31844. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-30306. The following is the description of the original issue:

      This is a clone of issue OCPBUGS-26400. The following is the description of the original issue:

      Description of problem:

      If GloballyDisableIrqLoadBalancing in disabled in the performance profile then irqs should be balanced across all cpus minus the cpus that are explicitly removed by crio via the pod annotation irq-load-balancing.crio.io: "disable"
      
      There's an issue when the scheduler plugin in tuned will attempt to affine all irqs to the non-isolated cores. Isolated here means non-reserved, not truly isolated cores. This is directly at odds with the user intent. So now we have tuned fighting with crio/irqbalance both trying to do different things. 
      
      Scenarios
      - If a pod get’s launched with the annotation after tuned has started, runtime or after a reboot - ok 
      - On a reboot if tuned recovers after the guaranteed pod has been launched - broken
      - If tuned restarts at runtime for any reason - broken

      Version-Release number of selected component (if applicable):

         4.14 and likely earlier

      How reproducible:

          See description

      Steps to Reproduce:

          1.See description 
          2.
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:

          

       

              yquinn@redhat.com Yanir Quinn
              openshift-crt-jira-prow OpenShift Prow Bot
              Mallapadi Niranjan Mallapadi Niranjan
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: