-
Feature
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
Product / Portfolio Work
-
False
-
-
False
-
Not Selected
Feature Overview
RFE: https://issues.redhat.com/browse/RFE-8626
Goal is for DP in 2.16. This is dependent on the add-on making it into OCM 1.2.0 (https://github.com/orgs/open-cluster-management-io/projects/2/views/28).
The goal is to support it as DP based on the upstream release. There is no downstream work planned for DP.
This feature aims to productize the upstream Dynamic Scoring Framework add-on for Open Cluster Management (OCM) into Red Hat Advanced Cluster Management (RHACM). It introduces a mechanism to make Placement decisions based on custom, external logic (Scoring API) and real-time time-series data, moving beyond static labels or simple resource claims.
Goals (GA)
- Intelligent Resource Efficiency: Place workloads based on real-time utilization (e.g., GPU/CPU power load) rather than just static capacity.
- Predictive Orchestration: Enable "Smart Scheduling" by integrating with AI/LLMs and predictive agents to target clusters based on forecasted utilization.
- Ecosystem Integration: Allow placement decisions to be informed by Red Hat-specific data sources, such as Red Hat Lightspeed (Insights) or OpenShift AI risk and compliance scores.
- Performance at Scale: Maintain high data freshness for scheduling without overwhelming the Hub cluster's performance by processing metrics locally on managed clusters.
Requirements
This Section: A list of specific needs or objectives that a Feature must
deliver to satisfy the Feature.. Some requirements will be flagged as MVP.
If an MVP gets shifted, the feature shifts. If a non MVP requirement slips,
it does not shift the feature.
| Requirement | Notes | isMvp? |
|---|---|---|
| CI - MUST be running successfully with test automation | This is a requirement for ALL features. |
YES |
| Release Technical Enablement | Provide necessary release enablement details and documents. |
YES |
Some Use Cases (for GA, not DP)
- AI Workload Bursting: A customer uses a custom Scoring API to monitor real-time GPU power load predictions; the framework automatically relocates AI workloads to clusters forecasted to have the highest efficiency.
- Compliance-Driven Placement: The Scoring API fetches a "Risk Score" from Red Hat Insights. OCM then restricts sensitive workloads to clusters currently meeting the highest compliance threshold.
- Cost-Aware Scheduling: Placement scores are dynamically adjusted based on external factors like variable electricity rates or operational hardware efficiency.
- (Stretch) Natural Language Querying: An administrator uses a chatbot integrated via an MCP server to ask, "Which cluster is currently the most cost-effective for a high-memory job?" and the system provides a recommendation based on live dynamic scores.
Questions to answer
- ...
Out of Scope
- The customer brings the scoring engine so we won't create one at this time.
- Connections to external tools.
Background, and strategic fit
This Section: What does the person writing code, testing, documenting
need to know? What context can be provided to frame this feature?
Assumptions
- ...
Customer Considerations
- ...
Documentation Considerations
Questions to be addressed:
- What educational or reference material (docs) is required to support this
product feature? For users/admins? Other functions (security officers, etc)? - Does this feature have a doc impact?
- New Content, Updates to existing content, Release Note, or No Doc Impact
- If unsure and no Technical Writer is available, please contact Content
Strategy. - What concepts do customers need to understand to be successful in
[action]? - How do we expect customers will use the feature? For what purpose(s)?
- What reference material might a customer want/need to complete [action]?
- Is there source material that can be used as reference for the Technical
Writer in writing the content? If yes, please link if available. - What is the doc impact (New Content, Updates to existing content, or
Release Note)?