Loading...

XML

Word

Printable

Type: Feature Request
Resolution: Unresolved
Priority: Major
Component/s: Observability
Labels:
None

Feature Request Overview

Introduce a tenant-attributed control-plane metering capability for Red Hat OpenStack Services on OpenShift (RHOSO) that measures raw API usage per tenant/project across Keystone, Barbican, Glance, Nova, Neutron, Cinder, and other control-plane services. The capability should:

Broker tenant-facing APIs through a supported API gateway (e.g., Red Hat 3scale API Management) and/or OpenShift-native ingress/Service Mesh to capture authenticated requests, response codes, latency, payload classes, and requester identity (tenant/project, domain, app-credential, user).

Emit standardized events/metrics (Prometheus + OpenTelemetry/CloudEvents) for each API call, with strong multi-tenant attribution.

Feed CloudKitty (or a new "control-plane usage" rating stream) to enable showback/chargeback of control-plane consumption (optionally priced) and capacity-planning signals.

Provide dashboards, alerts, and exports (Grafana, S3/CSV/Parquet) to identify tenant hot-spots ("who's hammering Keystone?"), detect abuse, and right-size/scale services.

Business justification

Cost recovery & fairness: Public cloud providers factor control-plane intensity into cost models. Without per-tenant API metering, high-churn tenants can impose outsized load on Keystone/Barbican/etc. with no financial signal. This feature enables fair-share chargeback/showback.

Capacity planning & SLOs: Visibility into call rates, p95/p99 latency, and error distributions per tenant enables proactive scaling, SLO management, and targeted optimization of specific services and endpoints.

Abuse & misconfiguration detection: Spikes in auth/token issuance, secrets operations, or image/list churn become actionable events (alerting, temporary throttling, or policy) instead of silent control-plane exhaustion.

Regulated & sovereign environments: Many customers run RHOSO in cost-recovery, multi-tenant public-sector or internal-cloud contexts. Transparent, auditable attribution of control-plane usage is required for governance.

Monetization options: Providers can selectively rate high-cost API classes (e.g., token issuance, heavy list/poll endpoints) while keeping baseline operations free—aligning tenant behavior with platform costs.

Functional requirements

A. Traffic capture & identity attribution

Brokerable path (provider's choice): Tenant-facing OpenStack APIs can be fronted by an API gateway or L7 proxy (e.g., OpenShift Routes/HAProxy filters, Gateway API/Envoy, Service Mesh, 3scale). Must support Strict (enforced) and Monitor-only (observe) modes.

Identity normalization: Extract and attach {{ {project_id, user_id, domain_id, app_credential_id, roles}
}} plus {{
{service, endpoint, method, path_template, status, bytes_in/out, duration_ms}
}} per request; no secret/token material persisted.

B. Telemetry & eventing

Signals: Prometheus metrics (counters/histograms) + optional OpenTelemetry traces/CloudEvents; controllable label cardinality and sampling.
**
Sinks & retention: Pluggable sinks (Prometheus, Kafka/AMQ Streams, S3/Parquet/ClickHouse); configurable retention and export (CSV/Parquet, CUR-like).

C. Rating, quotas, and policy

CloudKitty integration: New metric family control_plane.api_call with dimensions {{{} {service, endpoint_class, method, status_class, project}
{}}}; support tiered pricing and free quotas.
**
Optional controls: Per-tenant rate limits/quotas (token-bucket, bursts), exception lists for system tenants; HTTP 429 on hard limits.

D. Ops, UX, and migration

Dashboards & alerts: OOTB Grafana (or equivalent visualization) for tenant hot-spots, SLOs (p95/p99), and rated cost views; provider RBAC with tenant self-view (own usage only).
**
Config surface: RHOSO operator/CRDs to enable/disable, choose brokering mode, set sinks/retention, sampling, endpoint classes, and CloudKitty rating maps.
**
Compatibility: Start in Monitor-only; guided switch to Strict with pre-checks (detect/close direct paths). Auto-generate Routes/filters where needed.
**
Performance guardrails: Target ≤3-5% p95 latency overhead vs direct path; sane defaults for sampling and label allow/deny lists.

E. Security & compliance

PII controls: Hash potentially sensitive identifiers; redact headers/payload fragments by policy; enforce multi-tenant isolation for queries and exports.

Describe the customer impact

Immediate visibility & fairness: Providers can finally see—and show—who is driving control-plane load, enabling fair chargeback/showback and reducing cross-tenant subsidy.

Higher reliability at lower cost: With hot-spot detection and targeted rate-limits, operators protect Keystone/Barbican and other services from noisy-neighbor effects, improving SLO attainment without blanket over-provisioning.

Actionable capacity planning: Per-tenant, per-endpoint metrics transform capacity planning from guesswork to data-driven scaling, reducing waste and pre-empting incidents.

Governance & audit readiness: Attribution, retention, and export satisfy internal audit and public-sector cost-recovery requirements, simplifying compliance.

Predictable tenant experience: Transparent policies, quotas, and optional pricing shape behavior (e.g., caching, batching) and stabilize the platform for all tenants.

Low-friction adoption: Runs on OpenShift-native components with supported Red Hat options (3scale, Service Mesh, AMQ Streams, CloudKitty), providing a pragmatic path from observability → showback → chargeback without re-architecting tenant apps.

(Optional) Point of contact

gprocuni@redhat.com

(Optional) Additional links

Click More > Link to add any links to issues, such as an outcome, that are related to this feature request.

Assignee:: Jamie Parker

Reporter:: Greg Procunier

Votes:: 3 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/10/08 8:30 PM

Updated:: 2025/10/08 8:30 PM

Details

Description

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty