Uploaded image for project: 'RHOS Request for Features'
  1. RHOS Request for Features
  2. RHOSRFE-213

Chargeback - Metering the APIs

    • Icon: Feature Request Feature Request
    • Resolution: Unresolved
    • Icon: Major Major
    • Observability
    • None

      Feature Request Overview

      Introduce a tenant-attributed control-plane metering capability for Red Hat OpenStack Services on OpenShift (RHOSO) that measures raw API usage per tenant/project across Keystone, Barbican, Glance, Nova, Neutron, Cinder, and other control-plane services. The capability should:

      • Broker tenant-facing APIs through a supported API gateway (e.g., Red Hat 3scale API Management) and/or OpenShift-native ingress/Service Mesh to capture authenticated requests, response codes, latency, payload classes, and requester identity (tenant/project, domain, app-credential, user).
      • Emit standardized events/metrics (Prometheus + OpenTelemetry/CloudEvents) for each API call, with strong multi-tenant attribution.
      • Feed CloudKitty (or a new "control-plane usage" rating stream) to enable showback/chargeback of control-plane consumption (optionally priced) and capacity-planning signals.
      • Provide dashboards, alerts, and exports (Grafana, S3/CSV/Parquet) to identify tenant hot-spots ("who's hammering Keystone?"), detect abuse, and right-size/scale services.

      Business justification

      • Cost recovery & fairness: Public cloud providers factor control-plane intensity into cost models. Without per-tenant API metering, high-churn tenants can impose outsized load on Keystone/Barbican/etc. with no financial signal. This feature enables fair-share chargeback/showback.
      • Capacity planning & SLOs: Visibility into call rates, p95/p99 latency, and error distributions per tenant enables proactive scaling, SLO management, and targeted optimization of specific services and endpoints.
      • Abuse & misconfiguration detection: Spikes in auth/token issuance, secrets operations, or image/list churn become actionable events (alerting, temporary throttling, or policy) instead of silent control-plane exhaustion.
      • Regulated & sovereign environments: Many customers run RHOSO in cost-recovery, multi-tenant public-sector or internal-cloud contexts. Transparent, auditable attribution of control-plane usage is required for governance.
      • Monetization options: Providers can selectively rate high-cost API classes (e.g., token issuance, heavy list/poll endpoints) while keeping baseline operations free—aligning tenant behavior with platform costs.

      Functional requirements

      **

      A. Traffic capture & identity attribution

      • Brokerable path (provider's choice): Tenant-facing OpenStack APIs can be fronted by an API gateway or L7 proxy (e.g., OpenShift Routes/HAProxy filters, Gateway API/Envoy, Service Mesh, 3scale). Must support Strict (enforced) and Monitor-only (observe) modes.
      • Identity normalization: Extract and attach {{ {project_id, user_id, domain_id, app_credential_id, roles}

        }} plus {{

        {service, endpoint, method, path_template, status, bytes_in/out, duration_ms}

        }} per request; no secret/token material persisted.

      B. Telemetry & eventing

      • Signals: Prometheus metrics (counters/histograms) + optional OpenTelemetry traces/CloudEvents; controllable label cardinality and sampling.
        **
      • Sinks & retention: Pluggable sinks (Prometheus, Kafka/AMQ Streams, S3/Parquet/ClickHouse); configurable retention and export (CSV/Parquet, CUR-like).

      C. Rating, quotas, and policy

      • CloudKitty integration: New metric family control_plane.api_call with dimensions {{{} {service, endpoint_class, method, status_class, project}

        {}}}; support tiered pricing and free quotas.
        **

      • Optional controls: Per-tenant rate limits/quotas (token-bucket, bursts), exception lists for system tenants; HTTP 429 on hard limits.

      D. Ops, UX, and migration

      • Dashboards & alerts: OOTB Grafana (or equivalent visualization) for tenant hot-spots, SLOs (p95/p99), and rated cost views; provider RBAC with tenant self-view (own usage only).
        **
      • Config surface: RHOSO operator/CRDs to enable/disable, choose brokering mode, set sinks/retention, sampling, endpoint classes, and CloudKitty rating maps.
        **
      • Compatibility: Start in Monitor-only; guided switch to Strict with pre-checks (detect/close direct paths). Auto-generate Routes/filters where needed.
        **
      • Performance guardrails: Target ≤3-5% p95 latency overhead vs direct path; sane defaults for sampling and label allow/deny lists.

      E. Security & compliance

      • PII controls: Hash potentially sensitive identifiers; redact headers/payload fragments by policy; enforce multi-tenant isolation for queries and exports.

      Describe the customer impact

       

      • Immediate visibility & fairness: Providers can finally see—and show—who is driving control-plane load, enabling fair chargeback/showback and reducing cross-tenant subsidy.
      • Higher reliability at lower cost: With hot-spot detection and targeted rate-limits, operators protect Keystone/Barbican and other services from noisy-neighbor effects, improving SLO attainment without blanket over-provisioning.
      • Actionable capacity planning: Per-tenant, per-endpoint metrics transform capacity planning from guesswork to data-driven scaling, reducing waste and pre-empting incidents.
      • Governance & audit readiness: Attribution, retention, and export satisfy internal audit and public-sector cost-recovery requirements, simplifying compliance.
      • Predictable tenant experience: Transparent policies, quotas, and optional pricing shape behavior (e.g., caching, batching) and stabilize the platform for all tenants.
      • Low-friction adoption: Runs on OpenShift-native components with supported Red Hat options (3scale, Service Mesh, AMQ Streams, CloudKitty), providing a pragmatic path from observability → showback → chargeback without re-architecting tenant apps.

       

      (Optional) Point of contact

      (Optional) Additional links

      Click More > Link to add any links to issues, such as an outcome, that are related to this feature request.

              jamparke@redhat.com Jamie Parker
              gprocuni@redhat.com Greg Procunier
              Votes:
              3 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: