Uploaded image for project: 'OpenStack Strategy'
  1. OpenStack Strategy
  2. RHOSSTRAT-623

Enable Chargeback in RHOSO as Tech Preview

XMLWordPrintable

    • Icon: Feature Feature
    • Resolution: Unresolved
    • Icon: Major Major
    • rhos-18.0.15
    • rhos-18.0.14 FR 4
    • Observability
    • None
    • Important
    • RHOSSTRAT-963Chargeback and Showback capabilities in RHOSO
    • Not Selected
    • False
    • False
    • Hide

      None

      Show
      None
    • L
    • 0
    • 0
    • 0
    • 10% To Do, 45% In Progress, 45% Done
    • rhos-conplat-observability
    • Red Hat OpenStack Services on OpenShift (formerly Red Hat OpenStack Platform)
    • Technology Preview

      Feature Overview 

      Ability to request data from the data storage domains for the purposes of generating cloud resource usage reports and billing, allowing for consumption by external FinOps or billing systems.

      We are initiating a project to build chargeback capability for our OpenStack environment. Customers need the ability to understand their cloud usage for the purposes of:

      • Transparent Cost Recovery
      • Customer Trust and Competitive Differentiation

      Chargeback provides visibility about cloud resource consumption with various customer domains. 

      Why does the customer need this? (List the business requirements)

      Transparent Cost Recovery

      • As a cloud provider, the organization operates at large scale, offering on-demand compute, storage, and network resources to multiple tenants. Showback provides a clear, itemized breakdown of resource usage per tenant, ensuring each customer understands what they are consuming and the associated costs.
      • By attributing real-time or monthly costs to usage (e.g., compute hours, storage GB-month, bandwidth), the cloud provider can recoup operational expenses without surprising customers with opaque billing.
      • Showback also encourages optimized resource usage. When tenants see how their consumption affects costs, they can make informed decisions (e.g., rightsizing virtual machines or archiving stale data).

      Customer Trust and Competitive Differentiation

      • Offering transparent consumption reports increases customer trust. Tenants value clarity on how they are being charged or how their resource usage impacts their budget.
      • In a competitive market, having robust chargeback capabilities can differentiate the cloud provider by demonstrating cost visibility.
      • Chargeback enables additional use cases and an ability to become a cloud provider.

      Goals

      • This feature should allow for billing of tenants and resource planning based on tenant usage
      • Usage should be available as a RAW data output from the CLI
      • Usage should be based on the requirements agreed upon below in the requirements

      Requirements 

      To achieve a comprehensive view of IaaS consumption, we require the collection of specific metrics from the following services:

      • Nova (Compute)
      • Cinder (Block Storage)
      • Manila (Shared File Systems)
      • Neutron (Networking)
      • Keystone (Identity)
      • Ironic (Bare Metal)
      • Octavia (Load Balancing, including OVN load balancers)
      • Swift (Object Storage)
      • Glance (Image Service)
      • Barbican (Key Management)
      • Designate (DNS)
      • Heat (Orchestration)

       

      Service Metric MVP? Available in CloudKitty Work to include (S-XL) Comment
      Nova (Compute) Instance hours (total hours instances are running) Yes yes   This will provide lifetime but not uptime
        vCPU usage (allocated CPU count or CPU time)   Its the same as above    
        Memory usage (RAM allocated in MB/GB)   yes    
        Ephemeral storage usage (local disk usage) Yes yes    
      Cinder (Block Storage) Volume hours (allocated volume lifespan in hours)   yes    
        Volume size (allocated GB) Yes Its the same as above    
        Snapshot size (allocated GB for snapshots) Yes yes    
      Manila (Shared File Systems) Share hours (allocated share lifespan in hours)        
        Share size (allocated GB) Yes      
        Snapshot size (GB for snapshots)        
      Neutron (Networking) Floating IP hours (number of hours floating IPs are assigned) Yes yes    
        Network port usage (count of ports per tenant, possibly in hours)        
        Ingress/Egress bandwidth (volume of data in/out) Yes yes?   Should be checked that this can be mapped to the tenant
      Keystone (Identity)          
      Ironic (Bare Metal) Bare metal node hours (physical node allocation hours)        
        CPU/Memory resources (total CPU cores, memory in GB)        
        Node state changes (provisioning states, cleaning, etc.)        
      Octavia (Load Balancing) Load balancer hours (amphora or OVN LB usage) Yes no   There is an OVN exporter that we should investigate
        Listener hours (each load balancer can have multiple listeners)   no    
        Pool hours (hours each pool is active)
      VIP usage (virtual IP addresses, similar to floating IPs)
        no    
        OVN Load Balancers        
      For deployments leveraging OVN as a driver OVN LB creation/tear-down events        
      For deployments leveraging OVN as a driver OVN LB VIP or router port usage (as applicable)        
      For deployments leveraging OVN as a driver Traffic metrics (if exposed via OVN)        
      Swift (Object Storage) Object storage usage (GB stored × hours)   yes    
        Object count (objects per container)   yes    
        Container count (containers per tenant)   yes    
      Glance (Image Service) Image storage usage (GB stored × hours) Yes yes    
        Image count (number of active images per tenant)   its the same as above    
      Barbican (Key Management) Secrets count (number of secrets stored per tenant)   no    
        Container usage (if using Barbican containers for key groups)   no    
      Designate (DNS) DNS zone hours (each zone allocated × hours) Yes no    
        Record count (number of DNS records per zone, per tenant)   no    
        Zone create/delete events (track frequency of new zones or removed zones) Yes no    
      Heat (Orchestration) Stack hours (how long each stack is active)   no    
        Stack count (total active stacks per tenant)   no    
        Stack events (creation, updates, deletions)   no    

       

      Use Cases

      • Collection of usage rates and data export based on flavor type
      • As the administrator and customer of RHOSO, I want to reliably generate and access, itemized, raw data output (preferably JSON) of OpenStack resource usage for all my tenants, specifically ensuring that compute resources, <metric_1>, <metric_2, ..., and  <metric_n> are broken down by instance flavor and identified project names (resolved from project IDs for better understanding), so that I can implement transparent cost recovery for my tenants, integrate this data into my internal FinOps or billing solutions, and support my strategic capacity planning and operational efficiency initiatives.
      • While report generation will utilize the UUID by default, it should be possible to perform a lookup to get the vanity name of the UUID values.

      Out of Scope

      • Providing a billing interface and report generation that could be used for billing directly from the implementation.
      • Interfacing with Horizon or other GUI systems.

      Documentation Considerations

      • Product documentation
      • Minimal amount of information allowing for data collection enablement and access to creating the queries (ratings) required for the data export.
      • Product Management to create a blog post provides a walk through of a general use case with examples and output provided from the CLI.

      Questions to Answer

      • Can we get functionality merged and backported allowing the retrieval of flavor attributes from the compute systems.
      • Is it possible to get Manila data?
      • Can we get Loki team sign off and an internal support agreement for usage of Loki within OpenStack customer environments?
      • Do we need swift/S3 sign off?
      • Do we need nova? Sign off for flavor?

      Risks

      • upstream patch to provide storage adapter for Loki
      • downstream Loki being able to deploy on CRC for purposes of being enabled in the testing framework
      • deployment of a production-style compact OpenShift cluster with a production style Loki environment for review and sign off by the Loki team
      • getting CloudKitty image builds early enough in the development timeline to allow for testing and validation prior to shipping FR4

      Background and Strategic Fit

      This Feature is of high importance for a lot of new customers. Their use case could, as an example, be to build cloud services on their machines and hence would billing be crucial.

      Customer Considerations

      This feature is built from customer needs and should be continuously evaluated with the customer.

              lmadsen@redhat.com Leif Madsen
              jamparke@redhat.com Jamie Parker
              Simon Herlofsson Simon Herlofsson
              Edu Alcaniz Edu Alcaniz
              rhos-dfg-cloudops
              Votes:
              1 Vote for this issue
              Watchers:
              16 Start watching this issue

                Created:
                Updated: