Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-27577

[2182939] topologyHints with tscFrequency is set on the VMI when starting a VM from cold, preventing scheduling on some nodes due to NodeSelector

XMLWordPrintable

    • Important
    • None

      Description of problem:

      1. Prepare a cluster with worked nodes with different TSC frequencies
      2. Create a Windows VM, with re-enlightenment enabled
      3. Start the VM
      4. Note it was scheduled on node N, and labeled with the node tsc frequency of that node
      tsc-frequency-2419200000
      5. Stop the VM
      6. So far all ok.
      7. Make all nodes with that TSC 2419200000 frequency unschedulable
      8. Start the VM
      9. Pod fails to schedule

      Because, the VMI has:

      topologyHints:
      tscFrequency: 2419200000

      Which translates to a NodeSelector on the virt-launcher

      nodeSelector:
      hyperv.node.kubevirt.io/frequencies: "true"
      hyperv.node.kubevirt.io/ipi: "true"
      hyperv.node.kubevirt.io/reenlightenment: "true"
      hyperv.node.kubevirt.io/reset: "true"
      hyperv.node.kubevirt.io/runtime: "true"
      hyperv.node.kubevirt.io/synic: "true"
      hyperv.node.kubevirt.io/synictimer: "true"
      hyperv.node.kubevirt.io/tlbflush: "true"
      hyperv.node.kubevirt.io/vpindex: "true"
      kubevirt.io/schedulable: "true"
      scheduling.node.kubevirt.io/tsc-frequency-2419200000: "true" <--------

      The only node available has a different TSC:

      % oc get nodes
      NAME STATUS ROLES AGE VERSION
      black.toca.local Ready,SchedulingDisabled worker 9d v1.25.7+eab9cc9
      blue.toca.local Ready,SchedulingDisabled control-plane,master,worker 10d v1.25.7+eab9cc9
      green.toca.local Ready,SchedulingDisabled control-plane,master,worker 10d v1.25.7+eab9cc9
      indigo.toca.local Ready,SchedulingDisabled worker 10d v1.25.7+eab9cc9
      red.toca.local Ready,SchedulingDisabled control-plane,master,worker 10d v1.25.7+eab9cc9
      violet.toca.local Ready,SchedulingDisabled worker 10d v1.25.7+eab9cc9
      white.toca.local Ready worker 10d v1.25.7+eab9cc9
      yellow.toca.local Ready,SchedulingDisabled worker 10d v1.25.7+eab9cc9

      % oc get nodes white.toca.local -o yaml | grep tsc-frequency
      cpu-timer.node.kubevirt.io/tsc-frequency: "2592000000"
      scheduling.node.kubevirt.io/tsc-frequency-2592000000: "true"

      And no go:

      message: '0/8 nodes are available: 1 node(s) didn''t match Pod''s node affinity/selector,
      7 node(s) were unschedulable. preemption: 0/8 nodes are available: 8 Preemption
      is not helpful for scheduling.'

      When starting from cold, this should not be necessary. Apparently this comes from TopologyHinter, and effectivelly concentrates new VM starts on hosts with exact same TSC of the hosts they ran previously.

      Version-Release number of selected component (if applicable):
      OCP 4.12.9
      CNV 4.12.2

      How reproducible:
      Always

      Steps to Reproduce:
      As above

      Actual results:

      • VM fails to schedule on all nodes

      Expected results:

      • Fresh VM start can schedule on all nodes

              sgott@redhat.com Stuart Gott
              rhn-support-gveitmic Germano Veit Michel
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: