Uploaded image for project: 'Observability and Data Analysis Program'
  1. Observability and Data Analysis Program
  2. OBSDA-731

Power monitoring GA Release Tracker

XMLWordPrintable

    • Icon: Outcome Outcome
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • PM Power-monitoring
    • False
    • None
    • False
    • Not Selected
    • 0
    • 50% To Do, 29% In Progress, 21% Done
    • 0

      Background

      This Release tracker is created as it's brought to our attention that one Jira tracker can ease visualization for other parties within Red Hat.

      It contains (linked or parent-child relationship, will need to figure out) all needed Features in the OBSDA project to reach GA.

      Rationale and discussion of it can be found under the document: Power monitoring for OpenShift - Kepler GA. The content was agreed and signed off at (needed link)

      Note: Document already written by PM on week 8 2024, but management still reviewing by the date this Jira is written.

      Disclaimer: This is just a placeholder. Refer to Power monitoring for OpenShift - Kepler GA to understand the limitations, discusion and context

      Scope of the product

      The goal of  releasing power monitoring for Red Hat OpenShift in GA is to provide an easy way to install, run and integrate kepler project with Red Hat OpenShift and the Observability platform. The main deliverable is to provide a component for customers to enable their sustainability use case. 

      Kepler has some common misconceptions such as “being able to produce CO2 emissions data” (as advertised in ‘23 Summit) or “being footprint agnostic”. Hence, the following power monitoring scope clarifications are added in the following:

      1. Observability scope: power monitoring scope is to provide granular power-consumption metrics for OpenShift users, so they can understand the power consumption of their workloads with a high granularity level.
      2. Enable Sustainability initiatives: other projects can potentially take actions based on metrics provided by power monitoring, both automated or manual. This includes:
      3. Scaling/optimizations - autoscaler integration, anything that can get triggered by a kepler metric like other emerging projects such as PEAKS (Power Efficiency Aware Kubernetes Scheduler) 
      4. Reporting: providing cluster level, node level, project level information on CO2 footprint
      5. This is out of the scope of the power monitoring product itself.
      1. Integrations: power monitoring metrics are available in the OpenShift platform , they also have the ability to be exported to other systems in which more data is aggregated. It is in the scope of power monitoring to facilitate these integrations so metrics are able to be exported in a standard way.

      Kepler Technology status and usability

      Bare metal nodes provide power metrics directly from their hardware components, such as the CPU and DRAM. For instance, in x86 machines using RAPL, or ACPI. In contrast, VMs do not expose power metrics. The primary reason behind this is the absence of mechanisms to make these metrics available.

      By making use of different energy data sources, such as Running Average Power Limit (RAPL), Advanced Configuration and Power Interface (ACPI), Redfish based BMC, NVIDIA Management Library (NVML) when available, or pre-trained Machine Learning models for those APIs that are not available, Kepler can perform a power ratio modeling to compute contributions to the overall power consumption.

      The most typical situations in which Kepler does not have access to the power APIs mentioned above is in:

      1. Virtualized environments
      2. Environments, where hardware counters are not generated by the hardware itself. This can potentially apply to some non-Intel CPUs (AMD, ARM, IBM).

      OpenShift customers run their workloads in the hybrid cloud, and the vast majority does not do it on bare metal clusters, which has several limitations, as recently published.

      GA list

      The product team is committed to provide a great product which helps our customers with the support needed for General Availability. Issues identified as must-have for GA can be visualized by using the Filter. Product team will continue releasing more versions when needed, with an estimated cadence of a release per quarter. When this list is finished. GA will be released.

      As a summary, the list of features that will continue landing in the product can be summarized in:

      1. Pure productization activities, including non functional requirements, downstream work/testing. (productization filter)
      2. Addressing supportability (supportability filter)
      3. Answering support to customer’s feedback (customer-feedback filter)
      4. Enabling support for VMs, i.e., models (VM filter)

       

            rh-ee-rfloren Roger Florén
            rh-ee-rfloren Roger Florén
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: