-
Feature
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
False
-
None
-
False
-
Not Selected
-
0
-
0% To Do, 0% In Progress, 100% Done
Background
The aim of this feature is to provide a list of tested platforms for Virtual Machines and underlying environments in which power monitoring for Red Hat OpenShift has been tested, and compare results with other tools such as power meter or even Kepler itself in bare metal.
Motivation
Kepler (and other power monitoring projects) has limitations regarding the accuracy of metrics exported in Virtual Machines (see https://www.cncf.io/blog/2023/10/11/exploring-keplers-potentials-unveiling-cloud-application-power-consumption/)
In Virtual Machines, Kepler does not normally have access to the same information as in bare metal setups. In those cases, Kepler will apply available pre-trained models. It is mandatory that end users have a clear idea on the validity and accuracy of the numbers shown by Kepler.
Note that these limitations not only apply to the fact that models need to be trained in many platforms but also that data is different depending on several factors, including: Hardware overcommitment and number of VMs sharing the underlying HW.
Requirements
Upon the completion of this ticket:
- A process to validate Kepler numbers and provide errors and accuracy information shall be in place for existing models
- Train models for VMs running on a list of platforms (consider platforms already covered by the perf and scale team and the outlined ones in OBSDA-627 for AWS, GCP etc)
- Validate the provided models
- On the tested platforms, have error bars and accuracy information
- The objective is for users to understand, on the tested platforms, which validity of data to expect.
This exercise only applies for workloads running on Virtual Machines. For Bare Metal see OBSDA-650
To clarify
- How to extend this effort in time?
- Will the exercise be repeated every OCP release?