Loading...

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: CNV v4.13.0
Affects Version/s: None
Component/s: CNV Documentation
Labels:
None

Activity Type:
Quality / Stability / Reliability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
CNV-18512
BZ Status:
CLOSED
BZ URL:
https://bugzilla.redhat.com/show_bug.cgi?id=2174229
Bugzilla Bug:
RHBZ: 2174229
Release Note Type:
Known Issue
Release Note Status:
Done
Intelligence Requested:
Market:

Severity:
Important

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

+++ This bug was initially created as a clone of Bug #2151169 +++

+++ This bug was initially created as a clone of Bug #2139896 +++

Description of problem:
We see this issue while creating a Windows10 VM on cnv2-engineering

We see the status message: server error. command SyncVMI failed: "LibvirtError(Code=67, Domain=10,
Message=''unsupported configuration: Requested TSC frequency 1699998000 Hz is
outside tolerance range ([2099473001, 2100522999] Hz) around host frequency
2099998000 Hz and TSC scaling is not supported by the host CPU'')

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Windows10 VM on cnv2-engineering
2.
3.

Actual results:
VM fails to start with message:

"LibvirtError(Code=67, Domain=10,
Message=''unsupported configuration: Requested TSC frequency 1699998000 Hz is outside tolerance range ([2099473001, 2100522999] Hz) around host frequency
2099998000 Hz and TSC scaling is not supported by the host CPU'')

Expected results:

VM starts successfully.

Additional info:

But looking at this, it should already have been fixed https://gitlab.com/libvirt/libvirt/-/issues/188

Surprisingly we see this issue still with 4.11.1.

— Additional comment from Dominik Holler on 2022-11-11 08:16:43 UTC —

Disabling reenlightenment seems to be a temporary work around:

kind: VirtualMachine
spec:
template:
spec:
domain:
features:
hyperv:

reenlightenment: {}

— Additional comment from Fabian Deutsch on 2022-11-21 08:44:12 UTC —

This bug is in POST, should it be pulled into 4.12? (Only if it can make it)

— Additional comment from on 2022-11-21 14:30:50 UTC —

Blockers only freeze was the 17th, so this BZ needs to be considered a blocker in order to qualify. (It might well be).

— Additional comment from on 2022-11-21 14:34:04 UTC —

Actually, this is already backported to 4.12.

— Additional comment from on 2022-11-21 14:36:17 UTC —

Marking this as a blocker for 4.12 based on QE recommendation.

https://bugzilla.redhat.com/show_bug.cgi?id=2141954 we encounter this BZ if re-enlightenment is completely disabled.

— Additional comment from Kedar Bidarkar on 2022-11-21 14:42:44 UTC —

We would be fixing this bug in 4.12.0 https://bugzilla.redhat.com/show_bug.cgi?id=2139896

— Additional comment from Red Hat Bugzilla on 2022-12-15 08:28:52 UTC —

Account disabled by LDAP Audit for extended failure

— Additional comment from Antonio Cardace on 2023-01-12 14:34:40 UTC —

@ffossemo@redhat.com will take care of the backport for 4.11.3 as the automatic cherry-pick failed.

— Additional comment from on 2023-01-13 10:16:22 UTC —

The original PR is already backported https://github.com/kubevirt/kubevirt/pull/8996

— Additional comment from on 2023-01-13 10:21:52 UTC —

http://cnv-version-explorer.apps.cnv2.engineering.redhat.com/BundleDetails?ver=v4.11.3-2 build contains the fix. Is it possible to check whether this still happens? Thanks!

— Additional comment from Denys Shchedrivyi on 2023-01-24 15:47:18 UTC —

I verified on CNV v4.11.3-8

VM with reenlightenment flag is trying to run only on the nodes with appropriate tsc-frequency or on the nodes with tsc-scalable=true label.

The only my concern - on heterogeneous cluster VM with reenlightenment flag may never run on specific nodes, even if I set nodeSelector explicitly.

For example, we have a cluster with these nodes:

> name: node01
> cpu-timer.node.kubevirt.io/tsc-frequency: '2099998000'
> cpu-timer.node.kubevirt.io/tsc-scalable: 'false'

> name: node03
> cpu-timer.node.kubevirt.io/tsc-frequency: '1699998000'
> cpu-timer.node.kubevirt.io/tsc-scalable: 'false'

> name: node04
> cpu-timer.node.kubevirt.io/tsc-frequency: '2095078000'
> cpu-timer.node.kubevirt.io/tsc-scalable: 'true'

The virt-controller finds the lowest frequency and add it to VMs, in my case it is `tsc-frequency: '1699998000'`, but since the node01 is tsc-scalable=false - VM will never try to run there.
When I set this node with node-selector - the POD stuck in Pending state with message:

> 0/10 nodes are available: 10 node(s) didn't match Pod's node
> affinity/selector. preemption: 0/10 nodes are available: 10 Preemption
> is not helpful for scheduling.

@iholder@redhat.com I suppose it is expected behavior: if tsc is not scalable on the node - skip this node
But what if I have a cluster where all 3 nodes non-scalable and with different tsc-freq, VM with reenlightenment (or with invtsc) will run only on one node with lowest frequency?

— Additional comment from Denys Shchedrivyi on 2023-02-01 19:11:49 UTC —

May be we can improve this logic somehow? Or at least we should document it as a known limitation of VMs with reenlightenment (or cpu/invtsc) flags on a cluster with non-scalable nodes

— Additional comment from Itamar Holder on 2023-02-06 17:10:25 UTC —

Hey Denys,

> May be we can improve this logic somehow?

QEMU had broke backward compatibility and introduced a limitation [1] which enforces us to pass explicit tsc frequency for HyperV Reenlightenment VMs.
Therefore, I don't see a clear way to improve the logic we have. Perhaps the right thing to do is sync with QEMU devs to try to think on a better solution.
In any case, I would sync with Vladik Romanovsky about this to try to think on what can be done.

> Or at least we should document it as a known limitation of VMs with reenlightenment

Documenting it clearly is always good, especially when this is a corner case + it doesn't seem that QEMU will remove this limitation anytime soon, if ever.
Having nodes with scalable TSC will solve this problem.
If we're talking about a mixed cluster, then HyperV Reenlightenment VMs won't be able to be scheduled on nodes that don't support scalable-tsc and have a higher than the lowest frequency on the cluster.

[1] https://gitlab.com/qemu-project/qemu/-/commit/561dbb41b1d752098249128d8462aaadc56fd15d

— Additional comment from Denys Shchedrivyi on 2023-02-06 17:43:51 UTC —

Moving this BZ to Verified. As discussed - we should document that this is the known limitation of mixed clusters with non-scalable nodes.

— Additional comment from Kedar Bidarkar on 2023-02-09 13:40:15 UTC —

When using a mixed cluster, then HyperV Reenlightenment VMs won't be able to be scheduled on nodes that don't support scalable-tsc and have a higher than the lowest frequency on the cluster.

is blocked by

CNV-22254 [2139896] Requested TSC frequency outside tolerance range & TSC scaling not supported

Closed

external trackers

Red Hat Issue Tracker CNV-26301

mentioned in: Page Loading...; Page Loading...

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates