Loading...

XML

Word

Printable

Type: Feature
Resolution: Unresolved
Priority: Undefined
Fix Version/s: PowerMon GA 1.0
Affects Version/s: None
Component/s: PM Power-monitoring
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Color Status:
Not Selected
PM Score:
0

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Impact Score:

Intelligence Requested:
Market:

Feature Overview (mandatory - Complete while in New status)
Introduce granular GPU Per-Process Power Metrics into Power Monitoring (Kepler). This is critical because AI workloads rely on GPUs, which consume high power. Measuring this consumption per process allows users to pinpoint waste and make data-driven decisions on workload placement to maximize efficiency.{}

Goals (mandatory - Complete while in New status)
Deliver per-process GPU power observability, functionally similar to existing CPU metrics, to enable optimization of GPU-intensive workloads.
What is the difference between today’s current state and a world with this Feature?

Current State: Kepler currently has ** no GPU power support which results in the likely largest power draw being a blindspot
Future State: Users can monitor real-time GPU energy usage down to the process level, allowing for informed data driven decisions.

Requirements (mandatory - _Complete while in Refinement status):

Requirement	Notes	isMVP?
GPU Power Metrics must be gathered at the Process level.	Must mirror CPU Process Metrics functionality (e.g., `kepler_process_cpu_watts`).	Yes
Metrics support for multi-instance GPUs.		?

Done - Acceptance Criteria (mandatory - Complete while in Refinement status): # GPU energy consumption metrics are successfully collected and exposed by Kepler at the container, pod, and process granularity.

Users can visualize and track GPU power metrics within the OpenShift Console dashboards.
The new GPU metrics demonstrate accurate measurement for workloads utilizing multi-instance GPUs.

Out of Scope (Initial completion while in Refinement *status):_ * GPU AI or performance metrics (focus is purely on power/energy attribution).

Any UI/dashboard development beyond displaying the new Kepler metric data.

Assignee:: Simon Herlofsson

Reporter:: Simon Herlofsson

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/12/10 11:34 AM

Updated:: 2026/01/27 2:20 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates