Loading...

XML

Word

Printable

Type: Story
Resolution: Done
Priority: Undefined
Fix Version/s: 1.2.0
Affects Version/s: None
Component/s: None
Labels:
- token-rate-limiting

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

As a platform engineer, I want token usage and limit metrics to be exposed in Prometheus/Otel format so that I can monitor how many tokens are being used per user or route and compare them to defined limits for alerting and capacity planning

This story covers exposing both usage and limit data as metrics suitable for Prometheus or OpenTelemetry collection. These metrics are critical for visibility into LLM cost and usage patterns and will allow platform teams to build dashboards or alerts.

Technical points:

Export output token usage metrics, ideally labeled by:
- route or gateway
- user_id or tenant (see https://github.com/Kuadrant/limitador/issues/434#issuecomment-3108779124)
Export configured limit values as separate metrics (e.g., daily quota)
Where possible, align labels and structure with existing Kuadrant rate limiting metrics
Enable downstream queries like usage / limit or usage > threshold for alerting

Metrics should reflect:

The current known usage (based on completed requests)
The configured quotas defined in each TokenRateLimitPolicy

This feature is observability-only and does not affect policy enforcement, but is critical for diagnosing policy effectiveness and model usage behavior.

Acceptance Criteria

Output token usage metrics are exposed in Prometheus format
- Metrics are available for scraping from the appropriate endpoint
- Metric name is meaningful and consistent
- Metric is cumulative and reflects tokens consumed from completed LLM responses
Usage metrics are labeled with identifying context
- user_id or tenant (as defined via TokenRateLimitPolicy.counters)
- Labels follow Prometheus best practices (e.g., stable, bounded cardinality
  ~~(Note: current implementation in wasm-shim doesnt allow for labels, there will be a need for relabeling metrics)~~
~~Usage and limit metrics can be correlated~~
- ~~Given a known namespace + user/group, the usage and limit metrics for that pair are both present and aligned in labels~~
- ~~Allows for queries such as:~~
  - ~~kuadrant_token_usage_output_total / kuadrant_token_limit_total~~
Usage metrics are exposed

- Given a known user/group the usage metrics are both present and can be found through the coresponding labels.
- Allows for queries such as:
  - user_usage
  - group_usage
  - namespace_usage

Metrics are scoped to completed requests only
- Token usage is not incremented on failed or incomplete requests
- Streamed responses are handled appropriately (i.e., tokens are counted once usage metrics arrive)
ServiceMonitor/PodMonitor configuration is documented
- Steps are documented for enabling metrics collection
- Steps for relabelling metrics are documented
New metrics are documented
- A short doc exists describing each new metric, label usage, and sample queries
- Existing Limitador metrics relevant to this story are also documented

links to

https://github.com/Kuadrant/kuadrant-operator/issues/1460

Assignee:: David Martin

Reporter:: David Martin

Votes:: 0 Vote for this issue

Watchers:: 4 Start watching this issue

Created:: 2025/07/24 8:48 AM

Updated:: 2025/11/03 11:17 AM

Resolved:: 2025/11/03 11:17 AM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates