Loading...

XML

Word

Printable

Type: Feature
Resolution: Unresolved
Priority: Critical
Fix Version/s: None
Affects Version/s: None
Component/s: None
Labels:

Story Points:
1
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Parent Link:
COST-7106Cost Management for Sovereign Cloud
AssignedTeam:
subs-cost

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

PX Impact Score:
PX Review Complete:

Intelligence Requested:
Market:

Leverage ACM to gather managed cluster metrics, instead of installing the operator on each cluster.

The main reason for this is if the user has admin permissions on the cluster, you cannot trust they will keep the CMMO installed and configured the way the service provider owner wants.

This is a hard requirement for Cost Management on-prem, and so far an OPTIONAL requirement for Cost Management SaaS.

There's two approaches I can think of (but maybe there's more – I'm open to hearing others!)

Context

Cost Management is a SaaS today and doesn't on without console.redhat.com dependencies (RBAC, authentication, etc) and AWS dependencies (eg. AWS Glue, the big data database).
Cost Management gathers data from the customer clusters using the Cost Management Metrics Operator. The CMMO the only thing that Cost Management users install on the clusters. Kruize runs on the server side. Cost Management does not require the Cluster Monitoring Operator, telemetry or anything like that. Only the CMMO.
ACM Thanos will receive any and all metrics Cost Management requires, for all of the managed clusters and even for the ACM cluster itself. This is already agreed upon with sberens@redhat.com, so that should not be a concern at all.
No need to support OpenShift clusters not managed by ACM
For now, no need to support third-party Kubernetes (but if that ever becomes a requirement, ACM will provide all the additional components xKS might

APPROACH A: MULTICLUSTER-AWARE CMMO

Install the Cost Management Metrics Operator on ACM
CMMO will be multicluster-aware, i. e. from ACM Thanos, it will be able to collect metrics for all the managed clusters
One payload (set of CSVs) per cluster, all of them "sent" to the Cost Management ingress at a time? Single payload with all the data for all clusters? Something else? TBD
Pro: looks easier to implement (especially if 1 payload/cluster), doesn't seem to break the backend at all (from the Koku POV, it's like multiple CMMOs runing on multiple clusters are uploading payloads)
Con: creates data duplication? (data is already in Thanos, we are "converting" it to CSV, then Parquet, then processing the data)
Con: does not work in real time? (i. e. since generating the payloads requires some processing, we might do this frequently but not all of the time)

APPROACH B: NO CMMO. KOKU INGESTS THANOS DIRECTLY

No need for a Cost Management Metrics Operator anywhere
Cost Management reads directly from Thanos, processes the data and writes reports to Postgres (how? TBD)
Pro:does not duplicate data: we'll use whatever is in Thanos and that's it, whether that's 2 weeks of data, or 2 years of data
Pro: depending on how we implement this (and how fast data processing is), it would allow for real-time use cases, eg. throttle workloads once they hit some budget, generate real-time spending alerts, etc
Con: looks like a big effort that fundamentally changes how we read and ingest data. Is it even worth it? Considering the payloads will be generated locally, should we take approach A, increase the data processing frequency and delete duplicate data?
Con: how do we do the OCP on cloud cases?
- Combined cloud-data + OCP-from-Thanos read and ingestion? Is that even possible?
- Write the cloud data to Thanos? (do we even need Postgre then, or can it be fully replaced with Thanos for everything?)

is related to

COST-4622 Cost Management in ACM

Backlog

relates to

COST-7045 Cost Management of ACM

Assignee:: Unassigned

Reporter:: Pau Garcia Quiles

Votes:: 0 Vote for this issue

Watchers:: 19 Start watching this issue

Created:: 2024/06/10 3:12 PM

Updated:: 2025/12/12 10:51 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates