-
Bug
-
Resolution: Done-Errata
-
Critical
-
CNV v4.13.8
-
None
-
5
-
False
-
-
False
-
virt-launcher-rhel9-container-v4.16.0-249
-
---
-
---
-
-
CNV Virtualization Sprint 249, CNV Virtualization Sprint 250, CNV Virtualization Sprint 251, CNV Virtualization Sprint 252
-
Urgent
-
No
Description of problem:
The node-labeller will only label a node with Skylake, Cascadelake or Icelake CPUs if the Node CPU has 'mpx' feature. This is incorrect, as not every model of those CPUs has this feature, MPX has been long deprecated and removed, even from the Linux kernel and QEMU. Also see https://www.phoronix.com/news/Intel-MPX-Is-Dead. The code that is doing this is [1]. It expects all features from /usr/share/libvirt/cpu_map/[CPU Model].xml to be present on the node CPU in order to label it. This is incorrect, some features may be missing, such as MPX. According to the libvirt team, in [2], kubevirt should not even be reading /usr/share/libvirt/cpu_map/[CPU Model].xml, so this logic shouldn't exist and better ways to determine the CPU model should be used. Also, some flags may be missing, sometimes microcode updates disable features due to security flaws and other reasons, MPX is not the first and likely won't be the last edge case.
Version-Release number of selected component (if applicable):
4.14, 4.13 (probably all)
How reproducible:
Always
Steps to Reproduce:
1. Get a node with a recent CPU model (i.e. Cascadelake or IceLake)
2. Confirm it doesn't have mpx
$ ssh core@blue.shift.home.arpa cat /proc/cpuinfo | grep mpx $ $ ssh core@blue.shift.home.arpa cat /proc/cpuinfo | grep name | head -n 1 model name : 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
3. Confirm its missing Skylake and Cascadelake labels
$ oc get node blue.shift.home.arpa -o yaml | grep cpu-model.node cpu-model.node.kubevirt.io/Broadwell-noTSX: "true" cpu-model.node.kubevirt.io/Broadwell-noTSX-IBRS: "true" cpu-model.node.kubevirt.io/Haswell-noTSX: "true" cpu-model.node.kubevirt.io/Haswell-noTSX-IBRS: "true" cpu-model.node.kubevirt.io/IvyBridge: "true" cpu-model.node.kubevirt.io/IvyBridge-IBRS: "true" cpu-model.node.kubevirt.io/Nehalem: "true" cpu-model.node.kubevirt.io/Nehalem-IBRS: "true" cpu-model.node.kubevirt.io/Opteron_G1: "true" cpu-model.node.kubevirt.io/Penryn: "true" cpu-model.node.kubevirt.io/SandyBridge: "true" cpu-model.node.kubevirt.io/SandyBridge-IBRS: "true" cpu-model.node.kubevirt.io/Westmere: "true" cpu-model.node.kubevirt.io/Westmere-IBRS: "true"
4. But it can run VMs with those CPUs
$ oc rsh virt-launcher-rhvm-mfd6f virsh domcapabilities | grep -E 'Cascade|Sky' | grep yes <model usable='yes' vendor='Intel'>Skylake-Server-noTSX-IBRS</model> <model usable='yes' vendor='Intel'>Skylake-Client-noTSX-IBRS</model> <model usable='yes' vendor='Intel'>Cascadelake-Server-noTSX</model>
5. Add debug (see my patch in [3]) and confirm it's due to missing MPX feature.
{"component":"virt-handler","level":"warning","msg":"CPU model Skylake-Server-noTSX-IBRS is missing required feature mpx","pos":"node_labeller.go:414","timestamp":"2024-01-19T00:24:35.332857Z"} {"component":"virt-handler","level":"warning","msg":"CPU model Skylake-Client-noTSX-IBRS is missing required feature mpx","pos":"node_labeller.go:414","timestamp":"2024-01-19T00:24:35.332885Z"} {"component":"virt-handler","level":"warning","msg":"CPU model Opteron_G2 is missing required feature svm","pos":"node_labeller.go:414","timestamp":"2024-01-19T00:24:35.332923Z"} {"component":"virt-handler","level":"warning","msg":"CPU model Cascadelake-Server-noTSX is missing required feature mpx","pos":"node_labeller.go:414","timestamp":"2024-01-19T00:24:35.332973Z"} {"component":"virt-handler","level":"warning","msg":"CPU model Skylake-Server-noTSX-IBRS is missing required feature mpx","pos":"node_labeller.go:414","timestamp":"2024-01-19T00:24:35.337136Z"} {"component":"virt-handler","level":"warning","msg":"CPU model Skylake-Client-noTSX-IBRS is missing required feature mpx","pos":"node_labeller.go:414","timestamp":"2024-01-19T00:24:35.337173Z"} {"component":"virt-handler","level":"warning","msg":"CPU model Opteron_G2 is missing required feature svm","pos":"node_labeller.go:414","timestamp":"2024-01-19T00:24:35.337210Z"} {"component":"virt-handler","level":"warning","msg":"CPU model Cascadelake-Server-noTSX is missing required feature mpx","pos":"node_labeller.go:414","timestamp":"2024-01-19T00:24:35.337262Z"}
Actual results:
Node is missing Skylake, Cascadelake and Icelake Labels Customer cannot expose these models to VMs.
Expected results:
Nodes are correctly labeled with CPUs it supports.
Additional info:
[1] https://github.com/kubevirt/kubevirt/blob/474d8d377c1fe36e03777bc36891fc2d4ab09afb/pkg/virt-handler/node-labeller/node_labeller.go#L413
[2] https://issues.redhat.com/browse/RHEL-19692
[3] https://github.com/germanovm/kubevirt/commit/5c7f3d37e727169b2583f9885e743a6d171ca459
- is blocked by
-
CNV-38581 KubeVirt should filter CPU models based on vendor
- New
-
CNV-38582 KubeVirt should aggressively squash the mpx flag
- New
- links to
-
RHEA-2023:122979 OpenShift Virtualization 4.16.0 Images