Uploaded image for project: 'OpenStack Strategy'
  1. OpenStack Strategy
  2. RHOSSTRAT-1006

Enable Chargeback in RHOSO Generally Available

XMLWordPrintable

    • Icon: Feature Feature
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • rhos-18.0.17 FR 5
    • Observability
    • None
    • Important
    • RHOSSTRAT-963Chargeback and Showback capabilities in RHOSO
    • Not Selected
    • False
    • False
    • Hide

      None

      Show
      None
    • L
    • 0
    • 0
    • 25% To Do, 75% In Progress, 0% Done
    • rhos-conplat-observability
    • Red Hat OpenStack Services on OpenShift (formerly Red Hat OpenStack Platform)
    • Feature

      Feature Overview

      Transition the CloudKitty chargeback feature from Technical Preview (TP) to General Availability (GA). This productization effort focuses on solidifying the feature by completing all required data collectors, establishing robust, automated testing frameworks, and publishing comprehensive, customer-ready documentation to meet GA release criteria.

      Background and Strategic Fit

      Providing a robust and fully supported chargeback feature is a core requirement for sovereign cloud deployments. Customers need to understand, and often bill for, the resources consumed by their end-users.

      Transitioning CloudKitty from Technical Preview to General Availability moves it from a "try-at-your-own-risk" feature to a core, supported component of the platform. This unblocks high-value customers who have been waiting for this functionality to be production-ready and enhances the platform's value proposition by enabling clear cost visibility and accountability.

      Goals

      • GA Productization: The chargeback feature is officially released and supported as a Generally Available component.
      • Documentation: Publish comprehensive documentation for end-users and administrators covering installation, configuration, usage, and known limitations.
      • Test Automation (Shift-Left): Implement and deploy automated testing suites that provide full API and integration coverage for CloudKitty and its dependencies (e.g., storage, compute).
      • Data Collector Completion: All OpenStack services identified within the GA scope provide the necessary telemetry data via stable, supported interfaces for accurate chargeback.
        • NOTE Not all requested metrics required to provide the entire scope of the chargeback query request may be possible in RHOSO 18 due to extensive refactoring or implementation lift. In those cases, the missing telemetry data will be noted and targeted for a future RHOSO release beyond 18.
      • Performance & Scalability: Validate that the chargeback feature performs and scales to meet defined GA-level production workload standards.
      • Support Readiness: Enable the technical support teams with the necessary training, troubleshooting guides, and diagnostic tools to handle customer-reported issues.
      • Security & Compliance: Complete a full security review and ensure the feature meets all standard compliance requirements for a GA product.

      Requirements

      A list of specific needs, capabilities, or objectives that must be delivered. Requirements flagged as MVP (Minimum Viable Product) are mandatory for the GA release.

      Requirement Notes isMVP?
      Upgrade Path Provide a documented, non-data-loss upgrade path for customers moving from the Tech Preview (TP) version of CloudKitty to the General Availability (GA) version. Yes
      GA-Level Documentation Publish comprehensive Administrator and End-User guides covering installation, configuration, usage, and known limitations. Yes
      API Test Automation Implement and integrate automated tests covering the entire public CloudKitty API surface. Yes
      Core Integration Tests Implement and integrate automated end-to-end tests for core data collectors (Compute, Volume, Network) to ensure data is collected and rated correctly. Yes
      Core Data Collectors Ensure data collectors for Compute (Nova), Volume (Cinder), and Network (Neutron) services are stable, supported, and collect all metrics defined in the GA scope. Yes
      Security Review Complete and pass a full security review, remediating all critical/high vulnerabilities prior to release. Yes
      Support Enablement Deliver a technical training package and a detailed Troubleshooting Guide to the global support team. Yes
      Performance Baseline Establish and execute a performance test suite to validate the feature against the defined GA-level production workload standards. Yes
      Data Export API Provide a stable, documented API endpoint for tenants and administrators to export their rated chargeback data (e.g., in CSV or JSON format). No
      Advanced Data Collectors Implement data collectors for non-core services (e.g., LBaas, Glance Images, Object Storage). No

      Done - Acceptance Criteria

      This articulates the value proposition and defines what is required to meet the goals of the feature from a user's point of view.

      • As a Cloud Administrator, I can follow the official documentation to upgrade my Tech Preview chargeback instance to the GA version without losing any existing data.
      • As a Cloud Administrator, I can install and configure the GA chargeback feature from scratch using only the provided documentation.
      • As a Cloud Administrator, I can define pricing rules for core services (Compute, Volume, Network) and see that resource usage is being collected and rated according to those rules.
      • As a Cloud Administrator, I can read the "Known Limitations" section of the documentation to understand precisely which metrics are not included in this GA release and are targeted for the future.
      • As Chargeback Administrator, I can use the API to query my project's resource consumption and associated costs for the current billing period.
      • As a Support Engineer, I can use the provided troubleshooting guide and diagnostic tools to identify and resolve common issues (e.g., data collection errors, rating misconfigurations).
      • As a Quality Engineer, I can confirm that all GA-scoped automated tests (API, integration) are integrated into the main CI/CD pipeline and are passing.
      • As a Security Engineer, I can confirm that the feature has passed the final security review and all critical/high findings have been addressed.

      Use Cases - User Experience & Workflow

      Personas

      • Cloud Administrator: Manages the OpenStack platform, responsible for installing, configuring, and maintaining the CloudKitty service. Defines pricing for all tenants.
      • Chargeback Administrator: Consumes cloud resources (VMs, volumes) and needs to view usage and cost data for their specific project/tenant utilizing the command line interface.

      Main Success Scenario (Admin Configures Chargeback)

      1. The Cloud Administrator enables the CloudKitty service as part of the RHOSO 18 deployment.
      2. The Cloud Administrator follows the documentation to configure the data collectors for Nova, Cinder, and Neutron.
      3. The Chargeback Administrator defines pricing rules (e.g., `$0.05/hr` per vCPU, `$0.02/GB/hr` for volumes) using the CloudKitty command line interface.
      4. The system begins collecting telemetry data from the specified services.
      5. The Chargeback Administrator queries the CloudKitty API via command line interface and verifies that data is being collected and rated correctly against the new pricing rules.

      Main Success Scenario (User Views Costs)

      1. For general available delivery of chargeback, there is no tenant aggregation interface. Reports will be generated by the chargeback administrator per-project and provided to the tenant after post-processing.

      Alternative Flow (Metric Not Available)

      1. A Chargeback Administrator wants to charge tenants for "load balancers used per hour."
      2. They consult the GA documentation to find the correct metric name and configuration.
      3. The documentation section does not state availability of the desired metric.
      4. Only available metrics are provided in documentation. Tracking of desired but unavailable metrics will be tracked in the issue tracker for future implementation consideration in conjunction with other service teams.

      Out of Scope

      • Billing & Invoicing: This feature provides chargeback (cost visibility), not a billing system. It will not generate invoices, manage payments, or integrate with payment gateways.
      • Post-Processing Aggregation: Aggregation of tenant chargeback data is the responsibility of the tenant or chargeback administrator utilizing out-of-band systems. CloudKitty will not provide the aggregated data of multiple projects.
      • Deferred Data Collectors: Any data collector that requires extensive refactoring of other OpenStack services (as noted in the Goals) is explicitly out of scope for RHOSO 18.
      • Complex Pricing Models: Support for highly complex, real-time, or "spot market" pricing models. The GA release will focus on stable, predictable rating models (e.g., flat rate, tiered rate).
      • Major UI Redesign: This feature productizes the existing TP functionality. A net-new user interface is not in scope.

      Documentation Considerations

      A complete review of documentation requirements will be provided by working with the documentation representative to answer the mini-content journey. Below are areas to consider when reviewing the mini-content journey to determine that all high level areas have been covered.

      • TP-to-GA Upgrade Guide: This is the most critical documentation requirement. A standalone guide must be created to walk existing TP customers through the upgrade. It must detail any configuration file changes, API changes, or data migration steps.
      • Cloud Administrator Guide: Must include:
        • Installation and configuration.
        • A list of all supported data collectors and the specific metrics they provide.
        • Configuration of the data storage backend (e.g., Loki, Prometheus).
      • Chargeback Administrator Guide: Must include:
        • How to query usage and cost data via the API.
        • An explanation of the data fields returned in a report.
        • A complete guide to "Rating" (defining pricing rules).
        • Generally this should provide enough information to allow the administrator to configure the chargeback interface and provide examples of how to export data after configuration.
      • Known Limitations Section: This section must be clear, prominent, and explicitly list all requested metrics/services (e.g., LBaas, Object Storage) that are not supported in RHOSO 18. This is critical for managing customer expectations.
      • Troubleshooting Guide: A new guide must be created for the support team, detailing common failure modes (e.g., "data is not appearing," "costs are calculated at $0") and their resolutions.
        • It is possible this troubleshooting guide remains internal for a period of time until the contents can be edited and provided as part of the core documentation.

      Questions to Answer (Refinement)

      • Data Collector Scope: Which specific services and metrics are definitively in scope for RHOSO 18? Which are being deferred? (We need a firm list before development can be finalized).
      • TP-GA API Changes: Are there any breaking API or configuration changes between the last TP release and the planned GA release?
      • Data Migration: Is the TP-to-GA upgrade data migration (if any) fully automated? What is the expected downtime or performance impact during this migration?
      • Performance Benchmarks: What are the specific, measurable GA-level performance and scalability targets? (e.g., "Support 1,000 tenants with data processing latency under 1 hour").
      • Support Triage: What are the top 3-5 support issues reported against the TP version? (This will help focus testing and troubleshooting documentation).
      • Dependency Contracts: Are the APIs for all in-scope data collectors (Nova, Cinder, etc.) considered stable and supported by their respective teams?
      • PyScript: Use of pyscript allows customers to use arbitrary scripts to calculate rating data. Determine if this could be misused and if there is a mitigation plan available. See https://docs.openstack.org/cloudkitty/latest/user/rating/pyscripts.html 

       

      Customer Considerations

      • Existing TP Customers: This is the primary consideration. The upgrade path must be as simple and non-disruptive as possible. All changes must be communicated clearly and well in advance.
      • Data Granularity: Customers have varying needs. Some want high-level "showback" for departmental budgeting, while others need granular data for per-user billing. The GA documentation must be precise about the level of granularity supported.
      • Data Export: Many customers will want to export this data into their own external financial or billing systems. The stability and documentation of the data export API are critical for them.
      • Configuration Complexity: The pricing/rating engine can be complex. The documentation and any default configurations should aim for simplicity, providing an easy "out-of-the-box" experience while still allowing for more complex rules.

              lmadsen@redhat.com Leif Madsen
              lmadsen@redhat.com Leif Madsen
              Simon Herlofsson Simon Herlofsson
              openshift-logging
              rhos-conplat-observability
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: