-
Task
-
Resolution: Done
-
Major
-
None
-
None
-
None
-
5
-
True
-
-
False
-
-
-
HAS Sprint 2264, HAS Sprint 2265
Task Description (Required)
the implementation of software template in ai-lab-templates
to support both vllm and llama.cpp
the template should provide an option to allow user select between vllm vs llama.cpp engine.
also description should be clear that vllm need to have nvidia GPU installed, and enough worker node space.
https://issues.redhat.com/browse/DEVHAS-686 need to be finished before this issue. the condition need to be set in software template to generate appropriate gitops repo for both vllm and llama.cpp
vllm image: quay.io/rh-aiservices-bu/vllm-openai-ubi9:0.4.2
for llama.cpp, model file to use will be GGUF ones,
for vllm, model file to use:
chatbot model: instructlab/granite-7b-lab
codegen model: Nondzu/Mistral-7B-code-16k-qlora
If this requires Change Management, complete sections below:
Change Request
<Select which item is being changed>
[ ] Add New Tokens
[ ] Rotate Tokens
[ ] Remove Tokens
[ ] Others: (specify)
Environment
<Select which environment the change is being made on. If both, open a separate issue so changes are tracked in each environment>
[ ] Stage OR
[ ] Prod
Backout Plan
<State what steps are needed to roll back in case something goes wrong>
Downtime
<Is there any downtime for these changes? If so, for how long>
Risk Level
<How risky is this change?>
Testing
<How are changes verified?>
Communication
<How are service owners or consumers notified of these changes?>
- clones
-
RHIDP-10740 implement gitops template using vLLM for the AI templates
-
- Closed
-