Type: Feature
Resolution: Unresolved
Priority: Normal
Fix Version/s: Future
Affects Version/s: None
Component/s: ACM AI
Labels:
None

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Color Status:
Not Selected
Hierarchy Progress Bar:

100% To Do, 0% In Progress, 0% Done

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Impact Score:

Intelligence Requested:
Market:

Feature Overview

tldr; Build an background agent that simplifies how customers interact with ACM.

Users are already starting to interact with technology differently. The advent of concierge apps on mobile devices and in the home have changed user-interaction patters over the past decade. Software is a key enabler here, providing interactions with AI chat bots, AI searches, AI assistants, and more.

Agent-based user-interactions are not new - ServiceNow and other IT Ticketing systems have leveraged machine learning in the past to provide users with in-context support for the task at hand, whether that be an insurance claim, support case, travel itinerary changes etc. Obviously there is nothing novel or provocative in saying that a support agent - ML or AI or other - can be warranted within the IT Operations teams.

So it is natural that such interactions will be expected from Red Hat Advanced cluster management, a tool designed to lower the overall operational costs of platform engineering (IT spend reduction) and speed the delivery of systems to end-users (10x the DevEx).

There are 3 axes along which we can analyze this -

Axis-1

Agentic technology has typically been used to produce:

Q&A systems or Chatbots
Personal Assistants which can also do things, more sophisticated than just chatbots
Daemon-like running in background silently doing things.

Initial goals: 2 and 3 are the ones we should target.

Axis-2

Agentic technology can:

read data from - databases & APIs, could be Kubernetes or non-Kubernetes (think edge)
produce outputs to a bring human(user) into the loop - like generate git PRs, events requiring approval, either of these before running a command.
mutate state of the system like - delete a pod, shut down a cluster, enforce a policy.

Practically speaking, we are not targeting 3 but keeping focus on 1 and 2

Axis -3

AI/ML/Analytics can be used to:

Generally interrogate system states
Do problem determination
Preempt problems

3 can be most valuable because it prevents problem in the first place. This is what our partner was doing with their AI. But it usually relies on the AI to mutate the system state in someway back to Axis - 2, point 3. - so we have to approach this in a balanced way.

Goals

This Section: Provide high-level goal statement, providing user context
and expected user outcome(s) for this feature. Before we execute an epic, we must pick one goal.

Produce an ACM Personal assistant which can make customer interaction with ACM delightful. Like -
- create a policy and generate a PR and interact with the PR
- Guide the user to creating a placement including managed clusterset etc and generate a PR and interact with the PR
- Do deep searches to answer a question by looking at data from search or Hub Kube API or managed cluster Kube API and/or metric data as demanded by the questions. A typical question could be - which of my VMs are sitting idle for the last one month across a bunch of labels OR why is policy foo flip flopping
- Perhaps have knowledge to check the current state - why is my search collector crashing or why is my addon not connecting
- Other?!?!
Produce an ACM agent running in the background which can prevent problems from happening or open Jira tickets with context. Like -
- adding a managed cluster to a hub cluster that is already loaded
- placing an workload on a managed cluster that is already loaded
- etc. To catch things before they become bad. Repairing things after they have gone bad is what we do today. Goal is to prevent things from going bad in the first place - and reduce cost for customer and Red Hat
- Open Jira if certificates would be expiring in x days
- etc
It is assumed that the Personal Assistant may integrate with OpenShift Lightspeed down the line. But our immediate focus is not how to integrate - that will inevitably happen. But finding out - what to integrate. That is what is the niche functionality that the customer needs that only ACM knowhow can produce.
We will have to get early versions of this in the demo systems and enhance it as we iterate.
- This will allow this to be demoed.
- This will prioritize the work to gather feedback on how the agent is performing along with capturing user feedback.
- This will also allow us to run this in a environment like customers and force us to consider where to run the Inference Server (LLM) etc upfront.

Requirements

This Section: A list of specific needs or objectives that a Feature must
deliver to satisfy the Feature.. Some requirements will be flagged as MVP.
If an MVP gets shifted, the feature shifts. If a non MVP requirement slips,
it does not shift the feature.

Requirement	Notes	isMvp?
Protection must be in place so that AI cannot mutate state without explicit human permission		YES
We should not bypass the Gitops mantra. AI is not meant to bypass the best principles of managing a fleet.		YES
Feedback collection- We must have a log of the questions being asked, the answers being given and user feedback. We should be able to examine this offline (perhaps ask the customer for a data dump) and improve		YES
CI - MUST be running successfully with test automation	This is a requirement for ALL features.	YES
Release Technical Enablement	Provide necessary release enablement details and documents.	YES

(Optional) Use Cases

This Section:

Main success scenarios - high-level user stories
Alternate flow/scenarios - high-level user stories
...

Questions to answer

Do we need to fine tune models or just can use prompts and agentic technology to solve the problems. We will learn as we go deeper
As we iterate, we will see a overlap of tools (not agents necessarily) being created by different teams. And will need to adjust accordingly
We will need RAG. Will our RAG be done at OpenShift Lightspeed level automatically? Can we get an handle to that agent ?

Out of Scope

Do these agents logically belong to the Global Hub level or ACM level.
This is separate from the effort to use ACM to deploy AI workloads like - Federated Learning, Multi Kueue etc.

Background, and strategic fit

This Section: What does the person writing code, testing, documenting
need to know? What context can be provided to frame this feature?

Assumptions

Contents of this feature should not be affected by the choice of underlying frameworks etc.

Customer Considerations

Customer should be able to use ACM without these agents - as is the case today
When customers use agents, they will have the technical option to run the LLM:

- outside the cluster in RH/IBM hosted environment
- outside the cluster in their environment (data does not leave the customer periphery)
- in the ACM hub cluster itself

Documentation Considerations

Questions to be addressed:

What educational or reference material (docs) is required to support this
product feature? For users/admins? Other functions (security officers, etc)?
Does this feature have a doc impact?
New Content, Updates to existing content, Release Note, or No Doc Impact
If unsure and no Technical Writer is available, please contact Content
Strategy.
What concepts do customers need to understand to be successful in
[action]?
How do we expect customers will use the feature? For what purpose(s)?
What reference material might a customer want/need to complete [action]?
Is there source material that can be used as reference for the Technical
Writer in writing the content? If yes, please link if available.
What is the doc impact (New Content, Updates to existing content, or
Release Note)?

impacts account

ACM-19349 Create an MCP Server for ACM, which can Interact with Hub and Managed Cluster Resources

New

is depended on by

ACM-19062 Continuous Feedback loop as we prototype

In Progress

is related to

ACM-21903 Multicluster AIOps - RHACM

Closed

Details

Description

Feature Overview

Goals

Requirements

(Optional) Use Cases

Questions to answer

Out of Scope

Background, and strategic fit

Assumptions

Customer Considerations

Documentation Considerations

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates