Uploaded image for project: 'OpenShift Virtualization'
  1. OpenShift Virtualization
  2. CNV-26301

[2174229] Requested TSC frequency outside tolerance range & TSC scaling not supported - Release Note

XMLWordPrintable

    • Important
    • None

      +++ This bug was initially created as a clone of Bug #2151169 +++

      +++ This bug was initially created as a clone of Bug #2139896 +++

      Description of problem:
      We see this issue while creating a Windows10 VM on cnv2-engineering

      We see the status message: server error. command SyncVMI failed: "LibvirtError(Code=67, Domain=10,
      Message=''unsupported configuration: Requested TSC frequency 1699998000 Hz is
      outside tolerance range ([2099473001, 2100522999] Hz) around host frequency
      2099998000 Hz and TSC scaling is not supported by the host CPU'')

      Version-Release number of selected component (if applicable):

      How reproducible:

      Steps to Reproduce:
      1. Windows10 VM on cnv2-engineering
      2.
      3.

      Actual results:
      VM fails to start with message:

      "LibvirtError(Code=67, Domain=10,
      Message=''unsupported configuration: Requested TSC frequency 1699998000 Hz is outside tolerance range ([2099473001, 2100522999] Hz) around host frequency
      2099998000 Hz and TSC scaling is not supported by the host CPU'')

      Expected results:

      VM starts successfully.

      Additional info:

      But looking at this, it should already have been fixed https://gitlab.com/libvirt/libvirt/-/issues/188

      Surprisingly we see this issue still with 4.11.1.

      — Additional comment from Dominik Holler on 2022-11-11 08:16:43 UTC —

      Disabling reenlightenment seems to be a temporary work around:

      kind: VirtualMachine
      spec:
      template:
      spec:
      domain:
      features:
      hyperv:

      1. reenlightenment: {}

      — Additional comment from Fabian Deutsch on 2022-11-21 08:44:12 UTC —

      This bug is in POST, should it be pulled into 4.12? (Only if it can make it)

      — Additional comment from on 2022-11-21 14:30:50 UTC —

      Blockers only freeze was the 17th, so this BZ needs to be considered a blocker in order to qualify. (It might well be).

      — Additional comment from on 2022-11-21 14:34:04 UTC —

      Actually, this is already backported to 4.12.

      — Additional comment from on 2022-11-21 14:36:17 UTC —

      Marking this as a blocker for 4.12 based on QE recommendation.

      https://bugzilla.redhat.com/show_bug.cgi?id=2141954 we encounter this BZ if re-enlightenment is completely disabled.

      — Additional comment from Kedar Bidarkar on 2022-11-21 14:42:44 UTC —

      We would be fixing this bug in 4.12.0 https://bugzilla.redhat.com/show_bug.cgi?id=2139896

      — Additional comment from Red Hat Bugzilla on 2022-12-15 08:28:52 UTC —

      Account disabled by LDAP Audit for extended failure

      — Additional comment from Antonio Cardace on 2023-01-12 14:34:40 UTC —

      @ffossemo@redhat.com will take care of the backport for 4.11.3 as the automatic cherry-pick failed.

      — Additional comment from on 2023-01-13 10:16:22 UTC —

      The original PR is already backported https://github.com/kubevirt/kubevirt/pull/8996

      — Additional comment from on 2023-01-13 10:21:52 UTC —

      http://cnv-version-explorer.apps.cnv2.engineering.redhat.com/BundleDetails?ver=v4.11.3-2 build contains the fix. Is it possible to check whether this still happens? Thanks!

      — Additional comment from Denys Shchedrivyi on 2023-01-24 15:47:18 UTC —

      I verified on CNV v4.11.3-8

      VM with reenlightenment flag is trying to run only on the nodes with appropriate tsc-frequency or on the nodes with tsc-scalable=true label.

      The only my concern - on heterogeneous cluster VM with reenlightenment flag may never run on specific nodes, even if I set nodeSelector explicitly.

      For example, we have a cluster with these nodes:

      > name: node01
      > cpu-timer.node.kubevirt.io/tsc-frequency: '2099998000'
      > cpu-timer.node.kubevirt.io/tsc-scalable: 'false'

      > name: node03
      > cpu-timer.node.kubevirt.io/tsc-frequency: '1699998000'
      > cpu-timer.node.kubevirt.io/tsc-scalable: 'false'

      > name: node04
      > cpu-timer.node.kubevirt.io/tsc-frequency: '2095078000'
      > cpu-timer.node.kubevirt.io/tsc-scalable: 'true'

      The virt-controller finds the lowest frequency and add it to VMs, in my case it is `tsc-frequency: '1699998000'`, but since the node01 is tsc-scalable=false - VM will never try to run there.
      When I set this node with node-selector - the POD stuck in Pending state with message:

      > 0/10 nodes are available: 10 node(s) didn't match Pod's node
      > affinity/selector. preemption: 0/10 nodes are available: 10 Preemption
      > is not helpful for scheduling.

      @iholder@redhat.com I suppose it is expected behavior: if tsc is not scalable on the node - skip this node
      But what if I have a cluster where all 3 nodes non-scalable and with different tsc-freq, VM with reenlightenment (or with invtsc) will run only on one node with lowest frequency?

      — Additional comment from Denys Shchedrivyi on 2023-02-01 19:11:49 UTC —

      May be we can improve this logic somehow? Or at least we should document it as a known limitation of VMs with reenlightenment (or cpu/invtsc) flags on a cluster with non-scalable nodes

      — Additional comment from Itamar Holder on 2023-02-06 17:10:25 UTC —

      Hey Denys,

      > May be we can improve this logic somehow?

      QEMU had broke backward compatibility and introduced a limitation [1] which enforces us to pass explicit tsc frequency for HyperV Reenlightenment VMs.
      Therefore, I don't see a clear way to improve the logic we have. Perhaps the right thing to do is sync with QEMU devs to try to think on a better solution.
      In any case, I would sync with Vladik Romanovsky about this to try to think on what can be done.

      > Or at least we should document it as a known limitation of VMs with reenlightenment

      Documenting it clearly is always good, especially when this is a corner case + it doesn't seem that QEMU will remove this limitation anytime soon, if ever.
      Having nodes with scalable TSC will solve this problem.
      If we're talking about a mixed cluster, then HyperV Reenlightenment VMs won't be able to be scheduled on nodes that don't support scalable-tsc and have a higher than the lowest frequency on the cluster.

      [1] https://gitlab.com/qemu-project/qemu/-/commit/561dbb41b1d752098249128d8462aaadc56fd15d

      — Additional comment from Denys Shchedrivyi on 2023-02-06 17:43:51 UTC —

      Moving this BZ to Verified. As discussed - we should document that this is the known limitation of mixed clusters with non-scalable nodes.

      — Additional comment from Kedar Bidarkar on 2023-02-09 13:40:15 UTC —

      When using a mixed cluster, then HyperV Reenlightenment VMs won't be able to be scheduled on nodes that don't support scalable-tsc and have a higher than the lowest frequency on the cluster.

              sjhala@redhat.com Shikha Jhala
              ctomasko Catherine Tomasko
              Kedar Bidarkar Kedar Bidarkar
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated:
                Resolved: