-
Feature Request
-
Resolution: Unresolved
-
Major
-
None
Feature Request Overview
Introduce a tenant-attributed control-plane metering capability for Red Hat OpenStack Services on OpenShift (RHOSO) that measures raw API usage per tenant/project across Keystone, Barbican, Glance, Nova, Neutron, Cinder, and other control-plane services. The capability should:
- Broker tenant-facing APIs through a supported API gateway (e.g., Red Hat 3scale API Management) and/or OpenShift-native ingress/Service Mesh to capture authenticated requests, response codes, latency, payload classes, and requester identity (tenant/project, domain, app-credential, user).
- Emit standardized events/metrics (Prometheus + OpenTelemetry/CloudEvents) for each API call, with strong multi-tenant attribution.
- Feed CloudKitty (or a new "control-plane usage" rating stream) to enable showback/chargeback of control-plane consumption (optionally priced) and capacity-planning signals.
- Provide dashboards, alerts, and exports (Grafana, S3/CSV/Parquet) to identify tenant hot-spots ("who's hammering Keystone?"), detect abuse, and right-size/scale services.
Business justification
- Cost recovery & fairness: Public cloud providers factor control-plane intensity into cost models. Without per-tenant API metering, high-churn tenants can impose outsized load on Keystone/Barbican/etc. with no financial signal. This feature enables fair-share chargeback/showback.
- Capacity planning & SLOs: Visibility into call rates, p95/p99 latency, and error distributions per tenant enables proactive scaling, SLO management, and targeted optimization of specific services and endpoints.
- Abuse & misconfiguration detection: Spikes in auth/token issuance, secrets operations, or image/list churn become actionable events (alerting, temporary throttling, or policy) instead of silent control-plane exhaustion.
- Regulated & sovereign environments: Many customers run RHOSO in cost-recovery, multi-tenant public-sector or internal-cloud contexts. Transparent, auditable attribution of control-plane usage is required for governance.
- Monetization options: Providers can selectively rate high-cost API classes (e.g., token issuance, heavy list/poll endpoints) while keeping baseline operations free—aligning tenant behavior with platform costs.
Functional requirements
**
A. Traffic capture & identity attribution
- Brokerable path (provider's choice): Tenant-facing OpenStack APIs can be fronted by an API gateway or L7 proxy (e.g., OpenShift Routes/HAProxy filters, Gateway API/Envoy, Service Mesh, 3scale). Must support Strict (enforced) and Monitor-only (observe) modes.
- Identity normalization: Extract and attach {{
{project_id, user_id, domain_id, app_credential_id, roles}
}} plus {{
{service, endpoint, method, path_template, status, bytes_in/out, duration_ms}}} per request; no secret/token material persisted.
B. Telemetry & eventing
- Signals: Prometheus metrics (counters/histograms) + optional OpenTelemetry traces/CloudEvents; controllable label cardinality and sampling.
** - Sinks & retention: Pluggable sinks (Prometheus, Kafka/AMQ Streams, S3/Parquet/ClickHouse); configurable retention and export (CSV/Parquet, CUR-like).
C. Rating, quotas, and policy
- CloudKitty integration: New metric family control_plane.api_call with dimensions {{{}
{service, endpoint_class, method, status_class, project}
{}}}; support tiered pricing and free quotas.
** - Optional controls: Per-tenant rate limits/quotas (token-bucket, bursts), exception lists for system tenants; HTTP 429 on hard limits.
D. Ops, UX, and migration
- Dashboards & alerts: OOTB Grafana (or equivalent visualization) for tenant hot-spots, SLOs (p95/p99), and rated cost views; provider RBAC with tenant self-view (own usage only).
** - Config surface: RHOSO operator/CRDs to enable/disable, choose brokering mode, set sinks/retention, sampling, endpoint classes, and CloudKitty rating maps.
** - Compatibility: Start in Monitor-only; guided switch to Strict with pre-checks (detect/close direct paths). Auto-generate Routes/filters where needed.
** - Performance guardrails: Target ≤3-5% p95 latency overhead vs direct path; sane defaults for sampling and label allow/deny lists.
E. Security & compliance
- PII controls: Hash potentially sensitive identifiers; redact headers/payload fragments by policy; enforce multi-tenant isolation for queries and exports.
Describe the customer impact
- Immediate visibility & fairness: Providers can finally see—and show—who is driving control-plane load, enabling fair chargeback/showback and reducing cross-tenant subsidy.
- Higher reliability at lower cost: With hot-spot detection and targeted rate-limits, operators protect Keystone/Barbican and other services from noisy-neighbor effects, improving SLO attainment without blanket over-provisioning.
- Actionable capacity planning: Per-tenant, per-endpoint metrics transform capacity planning from guesswork to data-driven scaling, reducing waste and pre-empting incidents.
- Governance & audit readiness: Attribution, retention, and export satisfy internal audit and public-sector cost-recovery requirements, simplifying compliance.
- Predictable tenant experience: Transparent policies, quotas, and optional pricing shape behavior (e.g., caching, batching) and stabilize the platform for all tenants.
- Low-friction adoption: Runs on OpenShift-native components with supported Red Hat options (3scale, Service Mesh, AMQ Streams, CloudKitty), providing a pragmatic path from observability → showback → chargeback without re-architecting tenant apps.
(Optional) Point of contact
(Optional) Additional links
Click More > Link to add any links to issues, such as an outcome, that are related to this feature request.