XMLWordPrintable

    • False
    • None
    • False
    • Not Selected
    • 0
    • OBSDA-731Power monitoring GA Release Tracker
    • 67% To Do, 0% In Progress, 33% Done

      Background

      The aim of this feature is to ensure that kepler metrics produced on Bare Metal are accurate, and that the values are compared against other tools such as node-exporter, process-exporter, etc.

      Motivation

      The motivation for this work is varied. On the one hand, not all hardware is providing the same APIs for power monitoring. We know RAPL is in nearly all Intel modern processors, also present in some AMD. ACPI plays a significant role for the platform power reporting too. apart from that, RedFish has shown better results when compared with power meter ones. (See OBSDA-645).

      Requirements

      All Available Metrics

      • kepler_<level>_bpf_block_irq_total
      • kepler_<level>_bpf_cpu_time_ms_total
      • kepler_<level>_bpf_net_rx_irq_total
      • kepler_<level>_bpf_net_tx_irq_total
      • kepler_<level>_bpf_page_cache_hit_total
      • kepler_<level>_cache_miss_total
      • kepler_<level>_cpu_cycles_total
      • kepler_<level>_cpu_instructions_total
      • kepler_<level>_cpu_ref_cycles_total
      • kepler_<level>_package_joules_total
      • kepler_<level>_platform_joules_total
      • kepler_<level>_uncore_joules_total
      • kepler_<level>_core_joules_total
      • kepler_<level>_dram_joules_total
      • kepler_<level>_joules_total
      • kepler_<level>_other_joules_total

      Metrics Validated

      • <level>_joules_total
      • component metrics for each level (node, process, vm, container ) - package, core, dram, uncore, other
      • platform metrics for node, process, vm, container - acpi, redfish
      • kepler_<level>_joules_total : <needs to be well defined>
      • kepler_<level>_package_joules_total
      • kepler_<level>_platform_joules_total

      Validated at a Node Level

      • kepler_node_core_joules_total
      • kepler_node_uncore_joules_total
      • kepler_node_dram_joules_total
      • kepler_node_other_joules_total

      Metrics not covered in GA

      • kepler_<level>_bpf_block_irq_total
      • kepler_<level>_bpf_cpu_time_ms_total
      • kepler_<level>_bpf_net_rx_irq_total
      • kepler_<level>_bpf_net_tx_irq_total
      • kepler_<level>_bpf_page_cache_hit_total
      • kepler_<level>_cache_miss_total
      • kepler_<level>_cpu_cycles_total
      • kepler_<level>_cpu_instructions_total
      • kepler_<level>_cpu_ref_cycles_total

      Upon the completion of this ticket, power monitoring users shall:

      • Have a list of metrics that are validated against other tools on Bare Metal
      • Test results will have MAE and MAPE showing differences in measurements against tools used to compare the values

       

              rh-ee-rfloren Roger Florén
              rh-ee-rfloren Roger Florén
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: