-
Bug
-
Resolution: Done
-
Major
-
None
-
None
-
Quality / Stability / Reliability
-
False
-
-
False
-
CLOSED
-
Known Issue
-
Done
-
-
-
Important
-
None
+++ This bug was initially created as a clone of Bug #2151169 +++
+++ This bug was initially created as a clone of Bug #2139896 +++
Description of problem:
We see this issue while creating a Windows10 VM on cnv2-engineering
We see the status message: server error. command SyncVMI failed: "LibvirtError(Code=67, Domain=10,
Message=''unsupported configuration: Requested TSC frequency 1699998000 Hz is
outside tolerance range ([2099473001, 2100522999] Hz) around host frequency
2099998000 Hz and TSC scaling is not supported by the host CPU'')
Version-Release number of selected component (if applicable):
How reproducible:
Steps to Reproduce:
1. Windows10 VM on cnv2-engineering
2.
3.
Actual results:
VM fails to start with message:
"LibvirtError(Code=67, Domain=10,
Message=''unsupported configuration: Requested TSC frequency 1699998000 Hz is outside tolerance range ([2099473001, 2100522999] Hz) around host frequency
2099998000 Hz and TSC scaling is not supported by the host CPU'')
Expected results:
VM starts successfully.
Additional info:
But looking at this, it should already have been fixed https://gitlab.com/libvirt/libvirt/-/issues/188
Surprisingly we see this issue still with 4.11.1.
— Additional comment from Dominik Holler on 2022-11-11 08:16:43 UTC —
Disabling reenlightenment seems to be a temporary work around:
kind: VirtualMachine
spec:
template:
spec:
domain:
features:
hyperv:
- reenlightenment: {}
— Additional comment from Fabian Deutsch on 2022-11-21 08:44:12 UTC —
This bug is in POST, should it be pulled into 4.12? (Only if it can make it)
— Additional comment from on 2022-11-21 14:30:50 UTC —
Blockers only freeze was the 17th, so this BZ needs to be considered a blocker in order to qualify. (It might well be).
— Additional comment from on 2022-11-21 14:34:04 UTC —
Actually, this is already backported to 4.12.
— Additional comment from on 2022-11-21 14:36:17 UTC —
Marking this as a blocker for 4.12 based on QE recommendation.
https://bugzilla.redhat.com/show_bug.cgi?id=2141954 we encounter this BZ if re-enlightenment is completely disabled.
— Additional comment from Kedar Bidarkar on 2022-11-21 14:42:44 UTC —
We would be fixing this bug in 4.12.0 https://bugzilla.redhat.com/show_bug.cgi?id=2139896
— Additional comment from Red Hat Bugzilla on 2022-12-15 08:28:52 UTC —
Account disabled by LDAP Audit for extended failure
— Additional comment from Antonio Cardace on 2023-01-12 14:34:40 UTC —
@ffossemo@redhat.com will take care of the backport for 4.11.3 as the automatic cherry-pick failed.
— Additional comment from on 2023-01-13 10:16:22 UTC —
The original PR is already backported https://github.com/kubevirt/kubevirt/pull/8996
— Additional comment from on 2023-01-13 10:21:52 UTC —
http://cnv-version-explorer.apps.cnv2.engineering.redhat.com/BundleDetails?ver=v4.11.3-2 build contains the fix. Is it possible to check whether this still happens? Thanks!
— Additional comment from Denys Shchedrivyi on 2023-01-24 15:47:18 UTC —
I verified on CNV v4.11.3-8
VM with reenlightenment flag is trying to run only on the nodes with appropriate tsc-frequency or on the nodes with tsc-scalable=true label.
The only my concern - on heterogeneous cluster VM with reenlightenment flag may never run on specific nodes, even if I set nodeSelector explicitly.
For example, we have a cluster with these nodes:
> name: node01
> cpu-timer.node.kubevirt.io/tsc-frequency: '2099998000'
> cpu-timer.node.kubevirt.io/tsc-scalable: 'false'
> name: node03
> cpu-timer.node.kubevirt.io/tsc-frequency: '1699998000'
> cpu-timer.node.kubevirt.io/tsc-scalable: 'false'
> name: node04
> cpu-timer.node.kubevirt.io/tsc-frequency: '2095078000'
> cpu-timer.node.kubevirt.io/tsc-scalable: 'true'
The virt-controller finds the lowest frequency and add it to VMs, in my case it is `tsc-frequency: '1699998000'`, but since the node01 is tsc-scalable=false - VM will never try to run there.
When I set this node with node-selector - the POD stuck in Pending state with message:
> 0/10 nodes are available: 10 node(s) didn't match Pod's node
> affinity/selector. preemption: 0/10 nodes are available: 10 Preemption
> is not helpful for scheduling.
@iholder@redhat.com I suppose it is expected behavior: if tsc is not scalable on the node - skip this node
But what if I have a cluster where all 3 nodes non-scalable and with different tsc-freq, VM with reenlightenment (or with invtsc) will run only on one node with lowest frequency?
— Additional comment from Denys Shchedrivyi on 2023-02-01 19:11:49 UTC —
May be we can improve this logic somehow? Or at least we should document it as a known limitation of VMs with reenlightenment (or cpu/invtsc) flags on a cluster with non-scalable nodes
— Additional comment from Itamar Holder on 2023-02-06 17:10:25 UTC —
Hey Denys,
> May be we can improve this logic somehow?
QEMU had broke backward compatibility and introduced a limitation [1] which enforces us to pass explicit tsc frequency for HyperV Reenlightenment VMs.
Therefore, I don't see a clear way to improve the logic we have. Perhaps the right thing to do is sync with QEMU devs to try to think on a better solution.
In any case, I would sync with Vladik Romanovsky about this to try to think on what can be done.
> Or at least we should document it as a known limitation of VMs with reenlightenment
Documenting it clearly is always good, especially when this is a corner case + it doesn't seem that QEMU will remove this limitation anytime soon, if ever.
Having nodes with scalable TSC will solve this problem.
If we're talking about a mixed cluster, then HyperV Reenlightenment VMs won't be able to be scheduled on nodes that don't support scalable-tsc and have a higher than the lowest frequency on the cluster.
[1] https://gitlab.com/qemu-project/qemu/-/commit/561dbb41b1d752098249128d8462aaadc56fd15d
— Additional comment from Denys Shchedrivyi on 2023-02-06 17:43:51 UTC —
Moving this BZ to Verified. As discussed - we should document that this is the known limitation of mixed clusters with non-scalable nodes.
— Additional comment from Kedar Bidarkar on 2023-02-09 13:40:15 UTC —
When using a mixed cluster, then HyperV Reenlightenment VMs won't be able to be scheduled on nodes that don't support scalable-tsc and have a higher than the lowest frequency on the cluster.
- is blocked by
-
CNV-22254 [2139896] Requested TSC frequency outside tolerance range & TSC scaling not supported
-
- Closed
-
- external trackers