-
Bug
-
Resolution: Done-Errata
-
Undefined
-
None
-
4.15
-
None
-
Important
-
No
-
CNF Compute Sprint 252, CNF Compute Sprint 253
-
2
-
False
-
-
-
Bug Fix
-
In Progress
-
This is a clone of issue OCPBUGS-31357. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-30980. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-27227. The following is the description of the original issue:
—
This is a clone of issue OCPBUGS-25699. The following is the description of the original issue:
—
Description of problem:
If GloballyDisableIrqLoadBalancing in disabled in the performance profile then irqs should be balanced across all cpus minus the cpus that are explicitly removed by crio via the pod annotation irq-load-balancing.crio.io: "disable" We have found a number of issues with this: 1) The script clear-irqbalance-banned-cpus.sh is setting an empty value for IRQBALANCE_BANNED_CPUS in /etc/sysconfig/irqbalance. If no value is provided, irqbalance will calculate a default. The default will exclude all isolated and nohz_full cpus from the mask resulting in the irq’s being balanced over the reserved cpus only, breaking the user intent. If a guaranteed pod with the irq-load-balancing.crio.io: "disable” annotation gets launched then irqbalance will heal the system but if one never does then all irqs will be affined to the reserved cores. This script needs to set the banned mask to 0’s on startup. 2) The more serious issue, the scheduler plugin in tuned will attempt to affine all irqs to the non-isolated cores. Isolated here means non-reserved, not truly isolated cores. This is directly at odds with the user intent. So now we have tuned fighting with crio/irqbalance both trying to do different things. Scenarios - If a pod get’s launched with the annotation after tuned has started, runtime or after a reboot - ok - On a reboot if tuned recovers after the guaranteed pod has been launched - broken - If tuned restarts at runtime for any reason - broken 3) Lastly the crio restore of the irqbalance mask needs to be removed. Disabling this should be part of the crio conf that is installed by the NTO.
Version-Release number of selected component (if applicable):
4.14 and likely earlier
How reproducible:
See description
Steps to Reproduce:
1.See description 2. 3.
Actual results:
Expected results:
Additional info:
- clones
-
OCPBUGS-31357 Dynamic irq load balancing issues (4.13)
- Closed
- is blocked by
-
OCPBUGS-31357 Dynamic irq load balancing issues (4.13)
- Closed
- links to
-
RHBA-2024:2782 OpenShift Container Platform 4.12.z bug fix update