Uploaded image for project: 'Cost Management'
  1. Cost Management
  2. COST-5115

Cost Management through ACM

XMLWordPrintable

    • Icon: Feature Feature
    • Resolution: Unresolved
    • Icon: Critical Critical
    • None
    • None
    • None
    • 1
    • False
    • Hide

      None

      Show
      None
    • False
    • COST-7106Cost Management for Sovereign Cloud
    • subs-cost

      Leverage ACM to gather managed cluster metrics, instead of installing the operator on each cluster.

      The main reason for this is if the user has admin permissions on the cluster, you cannot trust they will keep the CMMO installed and configured the way the service provider owner wants.

      This is a hard requirement for Cost Management on-prem, and so far an OPTIONAL requirement for Cost Management SaaS.

       

      There's two approaches I can think of (but maybe there's more – I'm open to hearing others!)

      Context

      • Cost Management is a SaaS today and doesn't on without console.redhat.com dependencies (RBAC, authentication, etc) and AWS dependencies (eg. AWS Glue, the big data database).
      • Cost Management gathers data from the customer clusters using the Cost Management Metrics Operator. The CMMO the only thing that Cost Management users install on the clusters. Kruize runs on the server side. Cost Management does not require the Cluster Monitoring Operator, telemetry or anything like that. Only the CMMO.
      • ACM Thanos will receive any and all metrics Cost Management requires, for all of the managed clusters and even for the ACM cluster itself. This is already agreed upon with sberens@redhat.com, so that should not be a concern at all.
      • No need to support OpenShift clusters not managed by ACM
      • For now, no need to support third-party Kubernetes (but if that ever becomes a requirement, ACM will provide all the additional components xKS might 

       

      APPROACH A: MULTICLUSTER-AWARE CMMO

      • Install the Cost Management Metrics Operator on ACM
      • CMMO will be multicluster-aware, i. e. from ACM Thanos, it will be able to collect metrics for all the managed clusters
      • One payload (set of CSVs) per cluster, all of them "sent" to the Cost Management ingress at a time? Single payload with all the data for all clusters? Something else? TBD
      • Pro: looks easier to implement (especially if 1 payload/cluster), doesn't seem to break the backend at all (from the Koku POV, it's like multiple CMMOs runing on multiple clusters are uploading payloads)
      • Con: creates data duplication? (data is already in Thanos, we are "converting" it to CSV, then Parquet, then processing the data)
      • Con: does not work in real time? (i. e. since generating the payloads requires some processing, we might do this frequently but not all of the time)

       

      APPROACH B: NO CMMO. KOKU INGESTS THANOS DIRECTLY

      • No need for a Cost Management Metrics Operator anywhere
      • Cost Management reads directly from Thanos, processes the data and writes reports to Postgres (how? TBD)
      • Pro:does not duplicate data: we'll use whatever is in Thanos and that's it, whether that's 2 weeks of data, or 2 years of data
      • Pro: depending on how we implement this (and how fast data processing is), it would allow for real-time use cases, eg. throttle workloads once they hit some budget, generate real-time spending alerts, etc
      • Con: looks like a big effort that fundamentally changes how we read and ingest data. Is it even worth it? Considering the payloads will be generated locally, should we take approach A, increase the data processing frequency and delete duplicate data?
      • Con: how do we do the OCP on cloud cases?
        • Combined cloud-data + OCP-from-Thanos read and ingestion? Is that even possible?
        • Write the cloud data to Thanos? (do we even need Postgre then, or can it be fully replaced with Thanos for everything?)

              Unassigned Unassigned
              pgarciaq@redhat.com Pau Garcia Quiles
              Votes:
              0 Vote for this issue
              Watchers:
              18 Start watching this issue

                Created:
                Updated: