-
Feature
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
Based on discussions we are looking at ways we can improve Red Hat Connectivity Link (RHCL) handles RateLimits and TokenRateLimit.
Goal is to help customer introduce control mechanisms to handle how limits are applied.
Proposed Solutions
Some thoughts we came up with to give customers more control (pending technical feasibility):
1. "Wiggle Room" Budget (Soft Limits)
- Concept: Allow users or admins to define a "wiggle room" budget.
- Logic: If a user has a budget of X and a "wiggle room" of Y:
-
- Scenario: User has 30 tokens left. Wiggle room is 20.
-
- Action: If the request is for 45 max tokens (30 available + 15 from wiggle room), allow it.
-
- Rejection: If the request requires 51 tokens (exceeding available + wiggle room), reject it immediately.
2. Pre-Request Token Estimation (Dry Run)
- Concept: Implement a mechanism to calculate expected token consumption before processing the full request.
- Goal: Allow the system (or the user via an API check) to verify if the request would be over budget before significant resources are spent processing it. This prevents "partial" processing that ends in a failure due to limits.
- Concerns: This would almost certainly cost some amount of latency
3. Concurrency Controls (Serialization)
- Concept: Add an option to prevent or queue concurrent requests for a specific user/token ID.
- Goal: This prevents race conditions where two simultaneous requests both check the balance, see available tokens, and then both execute, putting the user significantly over their allowed budget.
Other suggestions that make sense here are more than welcome