-
Task
-
Resolution: Done
-
Major
-
None
-
None
-
None
-
3
-
False
-
-
False
-
-
-
DEVAI Sprint 3280
Task Description (Required)
The IBM Granite model on our dev cluster is seeing little to no use, and I think we'd benefit from swapping it out to a model that will better support tool calling, especially as many of us start working on MCP/MCP-adjacent tools.
Llama-3.1-8B-Instruct-quantized.w4a16 is a good candidate to consider. It's a similar size to the granite model we currently use and thus would fit within our GPUs just fine.
DeepSeek-R1-Distill-Llama-8B-FP8-dynamic is another option, but I'm unsure of its support for toolcalling, so we may want to investigate that if we choose it.
In addition to replacing the deployed model, we should also clean up any old and failed models from the model registry