Uploaded image for project: 'OpenShift Bugs'
  1. OpenShift Bugs
  2. OCPBUGS-26400

tuned: tuned breaks dynamic IRQ affinity

    XMLWordPrintable

Details

    • +
    • Important
    • No
    • Rejected
    • False
    • Hide

      None

      Show
      None
    • Hide
      When a node reboot occurs all pods are restarted in a random order. In this scenario it is possible that `tuned` pod started after the workload pods. This means the workload pods start with partial tuning which can affect performance or even cause the workload to crash. (link:https://issues.redhat.com/browse/OCPBUGS-26400[*OCPBUGS-OCPBUGS-26400*])
      Show
      When a node reboot occurs all pods are restarted in a random order. In this scenario it is possible that `tuned` pod started after the workload pods. This means the workload pods start with partial tuning which can affect performance or even cause the workload to crash. (link: https://issues.redhat.com/browse/OCPBUGS-26400 [*OCPBUGS- OCPBUGS-26400 *])
    • Known Issue
    • Proposed
    • Hide
      2024-03-12: NTO 4.16 PR under review + 4.15 is ready as well to be merged once 4.16 passes validation
      2024-03-05: https://issues.redhat.com/browse/RHEL-21923 is in , NTO PR u/s ready and on track (once merged it will be backported to 4.15 and 4.14)
      Show
      2024-03-12: NTO 4.16 PR under review + 4.15 is ready as well to be merged once 4.16 passes validation 2024-03-05: https://issues.redhat.com/browse/RHEL-21923 is in , NTO PR u/s ready and on track (once merged it will be backported to 4.15 and 4.14)

    Description

      Description of problem:

      If GloballyDisableIrqLoadBalancing in disabled in the performance profile then irqs should be balanced across all cpus minus the cpus that are explicitly removed by crio via the pod annotation irq-load-balancing.crio.io: "disable"
      
      There's an issue when the scheduler plugin in tuned will attempt to affine all irqs to the non-isolated cores. Isolated here means non-reserved, not truly isolated cores. This is directly at odds with the user intent. So now we have tuned fighting with crio/irqbalance both trying to do different things. 
      
      Scenarios
      - If a pod get’s launched with the annotation after tuned has started, runtime or after a reboot - ok 
      - On a reboot if tuned recovers after the guaranteed pod has been launched - broken
      - If tuned restarts at runtime for any reason - broken

      Version-Release number of selected component (if applicable):

         4.14 and likely earlier

      How reproducible:

          See description

      Steps to Reproduce:

          1.See description 
          2.
          3.
          

      Actual results:

          

      Expected results:

          

      Additional info:

          

       

      Attachments

        Issue Links

          Activity

            People

              yquinn@redhat.com Yanir Quinn
              browsell@redhat.com Brent Rowsell
              Shereen Haj Shereen Haj
              Votes:
              1 Vote for this issue
              Watchers:
              15 Start watching this issue

              Dates

                Created:
                Updated: