-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
-
-
As a platform engineer, I want token usage and limit metrics to be exposed in Prometheus/Otel format so that I can monitor how many tokens are being used per user or route and compare them to defined limits for alerting and capacity planning
This story covers exposing both usage and limit data as metrics suitable for Prometheus or OpenTelemetry collection. These metrics are critical for visibility into LLM cost and usage patterns and will allow platform teams to build dashboards or alerts.
Technical points:
- Export output token usage metrics, ideally labeled by:
- route or gateway
- user_id or tenant (see https://github.com/Kuadrant/limitador/issues/434#issuecomment-3108779124)
- Export configured limit values as separate metrics (e.g., daily quota)
- Where possible, align labels and structure with existing Kuadrant rate limiting metrics
- Enable downstream queries like usage / limit or usage > threshold for alerting
Metrics should reflect:
- The current known usage (based on completed requests)
- The configured quotas defined in each TokenRateLimitPolicy
This feature is observability-only and does not affect policy enforcement, but is critical for diagnosing policy effectiveness and model usage behavior.
Acceptance Criteria
- Output token usage metrics are exposed in Prometheus format
- Metrics are available for scraping from the appropriate endpoint
- Metric name is meaningful and consistent
- Metric is cumulative and reflects tokens consumed from completed LLM responses
- Usage metrics are labeled with identifying context
- user_id or tenant (as defined via TokenRateLimitPolicy.counters)
- Labels follow Prometheus best practices (e.g., stable, bounded cardinality
(Note: current implementation in wasm-shim doesnt allow for labels, there will be a need for relabeling metrics)
Usage and limit metrics can be correlatedGiven a known namespace + user/group, the usage and limit metrics for that pair are both present and aligned in labelsAllows for queries such as:kuadrant_token_usage_output_total / kuadrant_token_limit_total
- Usage metrics are exposed
-
- Given a known user/group the usage metrics are both present and can be found through the coresponding labels.
- Allows for queries such as:
- user_usage
- group_usage
- namespace_usage
- Metrics are scoped to completed requests only
- Token usage is not incremented on failed or incomplete requests
- Streamed responses are handled appropriately (i.e., tokens are counted once usage metrics arrive)
- ServiceMonitor/PodMonitor configuration is documented
- Steps are documented for enabling metrics collection
- Steps for relabelling metrics are documented
- New metrics are documented
- A short doc exists describing each new metric, label usage, and sample queries
- Existing Limitador metrics relevant to this story are also documented