Loading...

Type: Bug
Resolution: Done-Errata
Priority: Undefined
Fix Version/s: CNV v4.16.6
Affects Version/s: CNV v4.16.4
Component/s: CNV Install, Upgrade and Operators
Labels:
- cnv-observability

Activity Type:
Incidents & Support
Story Points:
0.42
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Component Fix Version(s):
CNV v4.18.0.rhel9-450, CNV v4.16.6.rhel9-66
Git Pull Request:
https://github.com/kubevirt/hyperconverged-cluster-operator/pull/3186, https://github.com/kubevirt/hyperconverged-cluster-operator/pull/3202
Market:

Sprint:
CNV I/U Operators Sprint 262, CNV I/U Operators Sprint 263, CNV I/U Operators Sprint 264, CNV I/U Operators Sprint 265, CNV I/U Operators Sprint 266, CNV I/U Operators Sprint 267
Severity:
Moderate

Regression:
None

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Description of problem:


Take a single node cluster, the HCO will see the control plane is not highly available and set the kubevirt object with "infra: replicas: 1" as per [1]. That results in deployments with single replica for virt-api, 1 virt-controller etc as per [2].

Now look at the monitoring logic to fire alerts in [3], it doesn't look at the control pane availability of the cluster, it looks at the number of worker nodes and decides to fire the alerts if each pod has 1 replica only and the cluster has 2+ worker nodes.

This is conflicting logic, one side decides to scale or not based on X (control plane high availability), and the other complains or not based on Y (number of nodes).

Take a SNO cluster and add a single worker and this discrepancy will show up, as the control plane is not highly available but there are 2+ workers.

Please investigate what is the correct action here, if these alerts should not fire or if the components should start scaling.

[1] https://github.com/kubevirt/hyperconverged-cluster-operator/blob/cf24cf1dcc5fadf5ee53ed0fee7c005e42d6e66c/controllers/operands/kubevirt.go#L809
[2] https://github.com/kubevirt/kubevirt/blob/133430eab48bb567535f8556356e3b071b275388/pkg/virt-operator/resource/apply/apps.go#L56
[3] https://github.com/kubevirt/kubevirt/blob/15090de920b2345df8f56eb70a3d6ecc64d60992/pkg/monitoring/rules/alerts/virt-controller.go#L64

Version-Release number of selected component (if applicable):

4.16.4

How reproducible:

Always

Steps to Reproduce:

1. Install SNO cluster
2. Add one extra worker node.

Actual results:

Alerts like LowVirtAPICount and LowVirtControllersCount  firing

Expected results:

Either starts scaling or don't complain that its not scaling.

links to

[KCS] Getting LowVirtAPICount and LowVirtControllersCount on Single Node cluster

mentioned on

Merge request - Updated US source to: 7b4e937 Don't rely on InfraStructureTopology for infra HA (#3186)

Merge request - Updated US source to: 271a689 [release-1.12] Don't rely on InfraStructureTopology for infra HA (#3196) (#3202)

Merge request - Updated US source to: c9bc8dc Change default value for completionTimeoutPerGiB to 150s (#3195)

(1 mentioned on)

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates