Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-37685

node-labeller does not add Skylake, Cascadelake and Icelake labels if node is missing mpx.

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Critical Critical
    • CNV v4.16.0
    • CNV v4.13.8
    • CNV Virtualization
    • None
    • 5
    • False
    • Hide

      None

      Show
      None
    • False
    • virt-launcher-rhel9-container-v4.16.0-249
    • ---
    • ---
    • CNV Virtualization Sprint 249, CNV Virtualization Sprint 250, CNV Virtualization Sprint 251, CNV Virtualization Sprint 252
    • Urgent
    • No

      Description of problem:

      
      The node-labeller will only label a node with Skylake, Cascadelake or Icelake CPUs if the Node CPU has 'mpx' feature. This is incorrect, as not every model of those CPUs has this feature, MPX has been long deprecated and removed, even from the Linux kernel and QEMU. Also see https://www.phoronix.com/news/Intel-MPX-Is-Dead.
      
      The code that is doing this is [1]. It expects all features from /usr/share/libvirt/cpu_map/[CPU Model].xml to be present on the node CPU in order to label it. This is incorrect, some features may be missing, such as MPX.
      
      According to the libvirt team, in [2], kubevirt should not even be reading /usr/share/libvirt/cpu_map/[CPU Model].xml, so this logic shouldn't exist and better ways to determine the CPU model should be used. Also, some flags may be missing, sometimes microcode updates disable features due to security flaws and other reasons, MPX is not the first and likely won't be the last edge case.
      
      

      Version-Release number of selected component (if applicable):

      4.14, 4.13 (probably all)
      
      

      How reproducible:

      Always
      

      Steps to Reproduce:
      1. Get a node with a recent CPU model (i.e. Cascadelake or IceLake)
      2. Confirm it doesn't have mpx

      $ ssh core@blue.shift.home.arpa cat /proc/cpuinfo | grep mpx
      $ 
      $ ssh core@blue.shift.home.arpa cat /proc/cpuinfo | grep name | head -n 1
      model name	: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
      

      3. Confirm its missing Skylake and Cascadelake labels

      $ oc get node blue.shift.home.arpa -o yaml | grep cpu-model.node
          cpu-model.node.kubevirt.io/Broadwell-noTSX: "true"
          cpu-model.node.kubevirt.io/Broadwell-noTSX-IBRS: "true"
          cpu-model.node.kubevirt.io/Haswell-noTSX: "true"
          cpu-model.node.kubevirt.io/Haswell-noTSX-IBRS: "true"
          cpu-model.node.kubevirt.io/IvyBridge: "true"
          cpu-model.node.kubevirt.io/IvyBridge-IBRS: "true"
          cpu-model.node.kubevirt.io/Nehalem: "true"
          cpu-model.node.kubevirt.io/Nehalem-IBRS: "true"
          cpu-model.node.kubevirt.io/Opteron_G1: "true"
          cpu-model.node.kubevirt.io/Penryn: "true"
          cpu-model.node.kubevirt.io/SandyBridge: "true"
          cpu-model.node.kubevirt.io/SandyBridge-IBRS: "true"
          cpu-model.node.kubevirt.io/Westmere: "true"
          cpu-model.node.kubevirt.io/Westmere-IBRS: "true"
      

      4. But it can run VMs with those CPUs

      $ oc rsh virt-launcher-rhvm-mfd6f virsh domcapabilities | grep -E 'Cascade|Sky' | grep yes
            <model usable='yes' vendor='Intel'>Skylake-Server-noTSX-IBRS</model>
            <model usable='yes' vendor='Intel'>Skylake-Client-noTSX-IBRS</model>
            <model usable='yes' vendor='Intel'>Cascadelake-Server-noTSX</model>
      

      5. Add debug (see my patch in [3]) and confirm it's due to missing MPX feature.

      {"component":"virt-handler","level":"warning","msg":"CPU model Skylake-Server-noTSX-IBRS is missing required feature mpx","pos":"node_labeller.go:414","timestamp":"2024-01-19T00:24:35.332857Z"}
      {"component":"virt-handler","level":"warning","msg":"CPU model Skylake-Client-noTSX-IBRS is missing required feature mpx","pos":"node_labeller.go:414","timestamp":"2024-01-19T00:24:35.332885Z"}
      {"component":"virt-handler","level":"warning","msg":"CPU model Opteron_G2 is missing required feature svm","pos":"node_labeller.go:414","timestamp":"2024-01-19T00:24:35.332923Z"}
      {"component":"virt-handler","level":"warning","msg":"CPU model Cascadelake-Server-noTSX is missing required feature mpx","pos":"node_labeller.go:414","timestamp":"2024-01-19T00:24:35.332973Z"}
      {"component":"virt-handler","level":"warning","msg":"CPU model Skylake-Server-noTSX-IBRS is missing required feature mpx","pos":"node_labeller.go:414","timestamp":"2024-01-19T00:24:35.337136Z"}
      {"component":"virt-handler","level":"warning","msg":"CPU model Skylake-Client-noTSX-IBRS is missing required feature mpx","pos":"node_labeller.go:414","timestamp":"2024-01-19T00:24:35.337173Z"}
      {"component":"virt-handler","level":"warning","msg":"CPU model Opteron_G2 is missing required feature svm","pos":"node_labeller.go:414","timestamp":"2024-01-19T00:24:35.337210Z"}
      {"component":"virt-handler","level":"warning","msg":"CPU model Cascadelake-Server-noTSX is missing required feature mpx","pos":"node_labeller.go:414","timestamp":"2024-01-19T00:24:35.337262Z"}
      

      Actual results:

      Node is missing Skylake, Cascadelake and Icelake Labels
      Customer cannot expose these models to VMs.
      

      Expected results:

      Nodes are correctly labeled with CPUs it supports.
      
      

      Additional info:
      [1] https://github.com/kubevirt/kubevirt/blob/474d8d377c1fe36e03777bc36891fc2d4ab09afb/pkg/virt-handler/node-labeller/node_labeller.go#L413
      [2] https://issues.redhat.com/browse/RHEL-19692
      [3] https://github.com/germanovm/kubevirt/commit/5c7f3d37e727169b2583f9885e743a6d171ca459

            bmordeha@redhat.com Barak Mordehai
            rhn-support-gveitmic Germano Veit Michel
            Akriti gupta Akriti gupta
            Votes:
            0 Vote for this issue
            Watchers:
            11 Start watching this issue

              Created:
              Updated:
              Resolved: