-
Bug
-
Resolution: Done-Errata
-
Undefined
-
None
-
4.15
-
None
-
Important
-
No
-
CNF Compute Sprint 251, CNF Compute Sprint 252
-
2
-
False
-
This is a clone of issue OCPBUGS-30980. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-27227. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-25699. The following is the description of the original issue:
—
Description of problem:
If GloballyDisableIrqLoadBalancing in disabled in the performance profile then irqs should be balanced across all cpus minus the cpus that are explicitly removed by crio via the pod annotation irq-load-balancing.crio.io: "disable" We have found a number of issues with this: 1) The script clear-irqbalance-banned-cpus.sh is setting an empty value for IRQBALANCE_BANNED_CPUS in /etc/sysconfig/irqbalance. If no value is provided, irqbalance will calculate a default. The default will exclude all isolated and nohz_full cpus from the mask resulting in the irq’s being balanced over the reserved cpus only, breaking the user intent. If a guaranteed pod with the irq-load-balancing.crio.io: "disable” annotation gets launched then irqbalance will heal the system but if one never does then all irqs will be affined to the reserved cores. This script needs to set the banned mask to 0’s on startup. 2) The more serious issue, the scheduler plugin in tuned will attempt to affine all irqs to the non-isolated cores. Isolated here means non-reserved, not truly isolated cores. This is directly at odds with the user intent. So now we have tuned fighting with crio/irqbalance both trying to do different things. Scenarios - If a pod get’s launched with the annotation after tuned has started, runtime or after a reboot - ok - On a reboot if tuned recovers after the guaranteed pod has been launched - broken - If tuned restarts at runtime for any reason - broken 3) Lastly the crio restore of the irqbalance mask needs to be removed. Disabling this should be part of the crio conf that is installed by the NTO.
Version-Release number of selected component (if applicable):
4.14 and likely earlier
How reproducible:
See description
Steps to Reproduce:
1.See description 2. 3.
Actual results:
Expected results:
Additional info:
- blocks
-
OCPBUGS-31442 Dynamic irq load balancing issues (4.12)
- Closed
- clones
-
OCPBUGS-30980 Dynamic irq load balancing issues (4.14)
- Closed
- is blocked by
-
OCPBUGS-30980 Dynamic irq load balancing issues (4.14)
- Closed
- is cloned by
-
OCPBUGS-31442 Dynamic irq load balancing issues (4.12)
- Closed
- links to
-
RHBA-2024:1761 OpenShift Container Platform 4.13.z bug fix update