Loading...

XML

Word

Printable

Type: Story
Resolution: Done
Priority: Undefined
Fix Version/s: CNV v4.14.0
Affects Version/s: None
Component/s: CNV Install, Upgrade and Operators
Labels:
- cnv-observability

Story Points:
3
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
kubevirt-metrics-code-refactoring-design
Acceptance Criteria:

Hide

Create the design, split this effort to the different action items it will require and provide an estimate for the effort each one will take

Show
Create the design, split this effort to the different action items it will require and provide an estimate for the effort each one will take
Feature Link:
CNV-8094 - CNV Observability
[QE] How to address?:
---
[QE] Why QE missed?:
---

Sprint:
CNV I/U Operators Sprint 229, CNV I/U Operators Sprint 230

Regression:
None

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

The KubeVirt metrics code is currently embedded in the heart of the operator code.
This causes issues with code readability, code complexity and maintainability etc.

There a 2 suggestion for improving the monitoring code in KubeVirt:
1. Kubevirt monitoring to be done externally
This story is about re-designing our monitoring components so they are developed and deployed externally to Kubevirt. In other words, monitoring components would be developed in a different repository and would be deployed separately (similarly to CDI, for example).

This has many advantages, for example:

Enhanced development speed
Decoupling monitoring code from operator code
Enhanced security: monitor publicly available data only
Becoming more modular and generic - resilient to future changes

Note that this approach is generic: it could be applied any operator, not only to Kubevirt. In addition, it can used to export data not only to Prometheus, but to other tools as well.

For more info, motivation, goals and architecture design, please look at:

https://github.com/kubevirt/community/pull/189

(By the time of writing this the design proposal is still a draft. Many Changes are expected. Feedback is much appreciated)

2. Create a monitoring directory for each operator repository and in it to have all the monitoring (metrics, alerts, runbooks) logic.

More details can be found here https://docs.google.com/document/d/1L2lcri3SogFhjaIutbVdvnSkFNMrXeBQ7L0Zs_izJzM/edit?usp=sharing
And an example is here https://github.com/operator-framework/operator-sdk/pull/5996

In this spike we need to determine which of the implementations is better for KubeVirt and if it can also be considered a best practices for other operators that want to have monitoring.

Things to consider during the evaluation of the different alternatives:
1. Be able to have all the labels that are taken from the environment like namespace, pod, container, instance, job, endpoint etc.
2. Be able to collect and report both metrics based on changes in the environment for the resources and metrics that should be collected periodically like CPU, Memory etc.
3. Have a way to catch braking changes on the operators core code so that we don't get bugs in the monitoring side.

split to

CNV-24647 [contd] [spike] Plan the monitoring code refactoring

Closed

links to

Design document

External Monitoring Design-Proposal

proposal: Monitoring code refactor

Assignee:: Shirly Radco

Reporter:: Itamar Holder

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2022/10/20 8:18 AM

Updated:: 2024/01/31 12:41 PM

Resolved:: 2023/01/23 1:44 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates