-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
-
-
Needs validation as a user story. Models have input token limits, so having limits in 2 places is questionable. Also, input tokens are included in token usage in response already for TokenRateLimitPolicy as of v1alpha1
As a platform engineer, I want to enforce input token rate limits at the Gateway and HTTPRoute level so that I can prevent excessive usage of expensive LLM APIs before requests reach the model server.