-
Bug
-
Resolution: Done-Errata
-
Critical
-
None
-
False
-
-
False
-
CLOSED
-
---
-
---
-
-
-
High
-
None
+++ This bug was initially created as a clone of Bug #2184860 +++
Description of problem:
The node labeller marks the nodes with:
- their exact TSC frequency
- the lowest TSC frequency in the cluster IF they support tsc-scalable
Hypothetical example, on a cluster where the lowest frequency is X
NAME TSC-FREQUENCY TSC-SCALABLE TSC-FREQUENCY-X TSC-FREQUENCY-Y
node1 X true true
node2 Y true true true
node3 Y false true
However, TSC scaling may not be an exact number in many CPU models, leaving some possible variation.
See this, same CPU on 2 different systems (or even same system and 2 reboots), there is a 1 Mhz difference that can show up between reboots or systems with same CPU
[ 0.000000] tsc: Fast TSC calibration using PIT
[ 0.000000] tsc: Detected 2297.345 MHz processor
[ 4.127014] tsc: Refined TSC clocksource calibration: 2297.339 MHz
[ 0.000000] tsc: Fast TSC calibration using PIT
[ 0.000000] tsc: Detected 2297.449 MHz processor
[ 4.063010] tsc: Refined TSC clocksource calibration: 2297.338 MHz
If we do X = 2297.338
Y = 2297.339
Then a Windows VM with re-enlightenment will never run on node3, because its missing TSC-FREQUENCY-X label by 1MHZ off.
The logic will consider this an heterogeneous cluster, but its not.
The system should be able to schedule VMs on any of those 3 nodes, regardless of TSC-SCALABLE or not. Because these are essentially the same frequency.
Lower layers accept this variance, BZ1839095
Version-Release number of selected component (if applicable):
4.12.10
How reproducible:
Always
Steps to Reproduce:
1. Use systems with TSC-SCALABLE = false and same CPUs, reboot until different frequencies.
Actual results:
- VMs fail to schedule on nodes with same CPU
Expected results:
- VMs scheduled
— Additional comment from Germano Veit Michel on 2023-04-06 01:57:56 UTC —
— Additional comment from Fabian Deutsch on 2023-04-11 12:51:27 UTC —
Adjusting priority because it relates to a customer case.
Urgent, because it will impact GS.
— Additional comment from on 2023-04-12 12:03:59 UTC —
- blocks
-
CNV-28388 [2189960] NodeSelector for tsc frequency does not tolerate small TSC variations
- Closed
- is blocked by
-
CNV-27907 [2184860] NodeSelector for tsc frequency does not tolerate small TSC variations
- Release Pending
- is duplicated by
-
CNV-28022 [2186208] [cnv-4.12] NodeSelector for tsc frequency does not tolerate small TSC variations
- Closed
- external trackers