Task Description (Required)

The IBM Granite model on our dev cluster is seeing little to no use, and I think we'd benefit from swapping it out to a model that will better support tool calling, especially as many of us start working on MCP/MCP-adjacent tools.

Llama-3.1-8B-Instruct-quantized.w4a16 is a good candidate to consider. It's a similar size to the granite model we currently use and thus would fit within our GPUs just fine.

DeepSeek-R1-Distill-Llama-8B-FP8-dynamic is another option, but I'm unsure of its support for toolcalling, so we may want to investigate that if we choose it.

In addition to replacing the deployed model, we should also clean up any old and failed models from the model registry

Assignee:: John Collier

Reporter:: John Collier

Team:: RHDH AI

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/08/26 9:54 PM

Updated:: 2025/11/25 9:10 PM

Resolved:: 2025/09/11 9:15 PM

Details

Description

Task Description (Required)

Attachments

Easy Agile Planning Poker

Activity

People

Dates

Hide