-
Sub-task
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
None
-
False
-
-
False
-
-
Create a standalone Python-based service (e.g., Flask or FastAPI) that exposes endpoints to serve responses from different LLMs, starting with the fine-tuned Kiali model. The service must support:
- Multi-model backend (via dynamic loading or routing)
- Token usage and latency metrics
- Configurable inference parameters
- Optional disk-based metric logging