Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-37685

node-labeller does not add Skylake, Cascadelake and Icelake labels if node is missing mpx.

    • Icon: Bug Bug
    • Resolution: Done-Errata
    • Icon: Critical Critical
    • CNV v4.16.0
    • CNV v4.13.8
    • CNV Virtualization
    • None
    • 5
    • False
    • Hide

      None

      Show
      None
    • False
    • virt-launcher-rhel9-container-v4.16.0-249
    • ---
    • ---
    • CNV Virtualization Sprint 249, CNV Virtualization Sprint 250, CNV Virtualization Sprint 251, CNV Virtualization Sprint 252
    • Critical
    • No

      Description of problem:

      The node-labeller will only label a node with Skylake, Cascadelake or Icelake CPUs if the Node CPU has 'mpx' feature. This is incorrect, as not every model of those CPUs has this feature, MPX has been long deprecated and removed, even from the Linux kernel and QEMU. Also see https://www.phoronix.com/news/Intel-MPX-Is-Dead.
      
      The code that is doing this is [1]. It expects all features from /usr/share/libvirt/cpu_map/[CPU Model].xml to be present on the node CPU in order to label it. This is incorrect, some features may be missing, such as MPX.
      
      According to the libvirt team, in [2], kubevirt should not even be reading /usr/share/libvirt/cpu_map/[CPU Model].xml, so this logic shouldn't exist and better ways to determine the CPU model should be used. Also, some flags may be missing, sometimes microcode updates disable features due to security flaws and other reasons, MPX is not the first and likely won't be the last edge case.
      
      

      Version-Release number of selected component (if applicable):

      4.14, 4.13 (probably all)
      
      

      How reproducible:

      Always
      

      Steps to Reproduce:
      1. Get a node with a recent CPU model (i.e. Cascadelake or IceLake)
      2. Confirm it doesn't have mpx

      $ ssh core@blue.shift.home.arpa cat /proc/cpuinfo | grep mpx
      $ 
      $ ssh core@blue.shift.home.arpa cat /proc/cpuinfo | grep name | head -n 1
      model name	: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
      

      3. Confirm its missing Skylake and Cascadelake labels

      $ oc get node blue.shift.home.arpa -o yaml | grep cpu-model.node
          cpu-model.node.kubevirt.io/Broadwell-noTSX: "true"
          cpu-model.node.kubevirt.io/Broadwell-noTSX-IBRS: "true"
          cpu-model.node.kubevirt.io/Haswell-noTSX: "true"
          cpu-model.node.kubevirt.io/Haswell-noTSX-IBRS: "true"
          cpu-model.node.kubevirt.io/IvyBridge: "true"
          cpu-model.node.kubevirt.io/IvyBridge-IBRS: "true"
          cpu-model.node.kubevirt.io/Nehalem: "true"
          cpu-model.node.kubevirt.io/Nehalem-IBRS: "true"
          cpu-model.node.kubevirt.io/Opteron_G1: "true"
          cpu-model.node.kubevirt.io/Penryn: "true"
          cpu-model.node.kubevirt.io/SandyBridge: "true"
          cpu-model.node.kubevirt.io/SandyBridge-IBRS: "true"
          cpu-model.node.kubevirt.io/Westmere: "true"
          cpu-model.node.kubevirt.io/Westmere-IBRS: "true"
      

      4. But it can run VMs with those CPUs

      $ oc rsh virt-launcher-rhvm-mfd6f virsh domcapabilities | grep -E 'Cascade|Sky' | grep yes
            <model usable='yes' vendor='Intel'>Skylake-Server-noTSX-IBRS</model>
            <model usable='yes' vendor='Intel'>Skylake-Client-noTSX-IBRS</model>
            <model usable='yes' vendor='Intel'>Cascadelake-Server-noTSX</model>
      

      5. Add debug (see my patch in [3]) and confirm it's due to missing MPX feature.

      {"component":"virt-handler","level":"warning","msg":"CPU model Skylake-Server-noTSX-IBRS is missing required feature mpx","pos":"node_labeller.go:414","timestamp":"2024-01-19T00:24:35.332857Z"}
      {"component":"virt-handler","level":"warning","msg":"CPU model Skylake-Client-noTSX-IBRS is missing required feature mpx","pos":"node_labeller.go:414","timestamp":"2024-01-19T00:24:35.332885Z"}
      {"component":"virt-handler","level":"warning","msg":"CPU model Opteron_G2 is missing required feature svm","pos":"node_labeller.go:414","timestamp":"2024-01-19T00:24:35.332923Z"}
      {"component":"virt-handler","level":"warning","msg":"CPU model Cascadelake-Server-noTSX is missing required feature mpx","pos":"node_labeller.go:414","timestamp":"2024-01-19T00:24:35.332973Z"}
      {"component":"virt-handler","level":"warning","msg":"CPU model Skylake-Server-noTSX-IBRS is missing required feature mpx","pos":"node_labeller.go:414","timestamp":"2024-01-19T00:24:35.337136Z"}
      {"component":"virt-handler","level":"warning","msg":"CPU model Skylake-Client-noTSX-IBRS is missing required feature mpx","pos":"node_labeller.go:414","timestamp":"2024-01-19T00:24:35.337173Z"}
      {"component":"virt-handler","level":"warning","msg":"CPU model Opteron_G2 is missing required feature svm","pos":"node_labeller.go:414","timestamp":"2024-01-19T00:24:35.337210Z"}
      {"component":"virt-handler","level":"warning","msg":"CPU model Cascadelake-Server-noTSX is missing required feature mpx","pos":"node_labeller.go:414","timestamp":"2024-01-19T00:24:35.337262Z"}
      

      Actual results:

      Node is missing Skylake, Cascadelake and Icelake Labels
      Customer cannot expose these models to VMs.
      

      Expected results:

      Nodes are correctly labeled with CPUs it supports.
      
      

      Additional info:
      [1] https://github.com/kubevirt/kubevirt/blob/474d8d377c1fe36e03777bc36891fc2d4ab09afb/pkg/virt-handler/node-labeller/node_labeller.go#L413
      [2] https://issues.redhat.com/browse/RHEL-19692
      [3] https://github.com/germanovm/kubevirt/commit/5c7f3d37e727169b2583f9885e743a6d171ca459

            [CNV-37685] node-labeller does not add Skylake, Cascadelake and Icelake labels if node is missing mpx.

            Since the problem described in this issue should be resolved in a recent advisory, it has been closed.

            For information on the advisory (Moderate: OpenShift Virtualization 4.16.0 Images security update), and where to find the updated files, follow the link below.

            If the solution does not work for you, open a new bug report.
            https://access.redhat.com/errata/RHSA-2024:4455

            Errata Tool added a comment - Since the problem described in this issue should be resolved in a recent advisory, it has been closed. For information on the advisory (Moderate: OpenShift Virtualization 4.16.0 Images security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4455

            Barak Mordehai added a comment - acardace@redhat.com waiting for reviews: https://github.com/kubevirt/kubevirt/pull/11407  

            bmordeha@redhat.com any update on this one?

            Antonio Cardace added a comment - bmordeha@redhat.com any update on this one?

            bmordeha@redhat.com thank you for the details.

            However, this part:

            We would only disable these features if the user does not need them explicitly. This would make our system compatible with more CPU Models, when svm or mpx are not on the node.

            Don't you mean the opposite? Always disable these features, unless the user explicitly needs them. Currently MPX is disabled in QEMU, so unless you write in the XML that you require MPX, it will be off. See

            $ oc get vm centos-9 -o yaml | yq '.spec.template.spec.domain.cpu'
            cores: 4
            features:
              - name: vmx
                policy: require
              - name: mpx
                policy: require
            sockets: 1
            threads: 1
            $ ssh centos-9 "grep mpx /proc/cpuinfo | wc -l"
            4
            
            $ oc get vm centos-9 -o yaml | yq '.spec.template.spec.domain.cpu'
            cores: 4
            features:
              - name: vmx
                policy: require
            sockets: 1
            threads: 1
            $ ssh centos-9 "grep mpx /proc/cpuinfo | wc -l"
            0
            
            $ oc get vm centos-9 -o yaml | yq '.spec.template.spec.domain.cpu'
            cores: 4
            features:
              - name: vmx
                policy: require
            model: Skylake-Client-noTSX-IBRS
            sockets: 1
            threads: 1
            $ ssh centos-9 "grep mpx /proc/cpuinfo | wc -l"
            0
            

            However, this would also require us to maintain a map of [cpu_models] -> [features_to_disable].

            Yes, this exactly what our other layered products such as RHV does. It forces MPX to off even if the node supports it. VMware does the same. And RHV does the same for hle and rtm flags (TSX), it had similar issues to what you are having now years ago, but with TSX (another instruction set that was broken and dropped).

            svm for Opteron_G2, to avoid this bug: https://bugzilla.redhat.com/show_bug.cgi?id=2122283

            I don't think Kubevirt needs to look at this and try to workaround that bug. This CPU model is deprecated for a long time, 2nd Gen Opteron is 22 years old. Even RHV already dropped it years ago (along with G1 and G3) in RHV 4.3 released May 2019. No customers would be using this, there is no point in using Dual Core CPUs on bare-metal OCP worker nodes, probably can't even run the basic system pods properly. You could just plan to drop support for this to simplify things.

            We can ask customers to restart their VMs without the mpx feature if they are using Skylake, Cascadelake, or Icelake models. This will help them to live migrate to nodes that do not support mpx.

            According to my tests above and also the tests from the customer int he support ticket they don't have to restart, unless MPX was explicitly enabled. It's mostly a labelling issue, the VMs would not be using MPX as its disabled in QEMU.

            Germano Veit Michel added a comment - bmordeha@redhat.com thank you for the details. However, this part: We would only disable these features if the user does not need them explicitly. This would make our system compatible with more CPU Models, when svm or mpx are not on the node. Don't you mean the opposite? Always disable these features, unless the user explicitly needs them. Currently MPX is disabled in QEMU, so unless you write in the XML that you require MPX, it will be off. See $ oc get vm centos-9 -o yaml | yq '.spec.template.spec.domain.cpu' cores: 4 features: - name: vmx policy: require - name: mpx policy: require sockets: 1 threads: 1 $ ssh centos-9 "grep mpx /proc/cpuinfo | wc -l" 4 $ oc get vm centos-9 -o yaml | yq '.spec.template.spec.domain.cpu' cores: 4 features: - name: vmx policy: require sockets: 1 threads: 1 $ ssh centos-9 "grep mpx /proc/cpuinfo | wc -l" 0 $ oc get vm centos-9 -o yaml | yq '.spec.template.spec.domain.cpu' cores: 4 features: - name: vmx policy: require model: Skylake-Client-noTSX-IBRS sockets: 1 threads: 1 $ ssh centos-9 "grep mpx /proc/cpuinfo | wc -l" 0 However, this would also require us to maintain a map of [cpu_models] -> [features_to_disable] . Yes, this exactly what our other layered products such as RHV does. It forces MPX to off even if the node supports it. VMware does the same. And RHV does the same for hle and rtm flags (TSX), it had similar issues to what you are having now years ago, but with TSX (another instruction set that was broken and dropped). svm for Opteron_G2, to avoid this bug: https://bugzilla.redhat.com/show_bug.cgi?id=2122283 I don't think Kubevirt needs to look at this and try to workaround that bug. This CPU model is deprecated for a long time, 2nd Gen Opteron is 22 years old. Even RHV already dropped it years ago (along with G1 and G3) in RHV 4.3 released May 2019. No customers would be using this, there is no point in using Dual Core CPUs on bare-metal OCP worker nodes, probably can't even run the basic system pods properly. You could just plan to drop support for this to simplify things. We can ask customers to restart their VMs without the mpx  feature if they are using Skylake, Cascadelake, or Icelake models. This will help them to live migrate to nodes that do not support  mpx . According to my tests above and also the tests from the customer int he support ticket they don't have to restart, unless MPX was explicitly enabled. It's mostly a labelling issue, the VMs would not be using MPX as its disabled in QEMU.

            Add KCS link

            Germano Veit Michel added a comment - Add KCS link

              bmordeha@redhat.com Barak Mordehai
              rhn-support-gveitmic Germano Veit Michel
              Akriti gupta Akriti gupta
              Votes:
              0 Vote for this issue
              Watchers:
              11 Start watching this issue

                Created:
                Updated:
                Resolved: