-
Task
-
Resolution: Done
-
Major
-
None
-
None
-
None
Task Description (Required)
https://www.reddit.com/r/LocalLLaMA/comments/18g21af/vllm_vs_llamacpp/
talks about the difference between vLLM and llama.cpp
llama.cpp does better when lack off GPU or VRAM, but vLLM has better performance since it takes GPU
The current software template uses llama.cpp, but the chatbot and codegen generates responses too slow.
This issue is too investigate if vLLM is going to have a better performance with chatbot and codegen samples, and how feasible to adopt it into the ai software template
If this requires Change Management, complete sections below:
Change Request
<Select which item is being changed>
[ ] Add New Tokens
[ ] Rotate Tokens
[ ] Remove Tokens
[ ] Others: (specify)
Environment
<Select which environment the change is being made on. If both, open a separate issue so changes are tracked in each environment>
[ ] Stage OR
[ ] Prod
Backout Plan
<State what steps are needed to roll back in case something goes wrong>
Downtime
<Is there any downtime for these changes? If so, for how long>
Risk Level
<How risky is this change?>
Testing
<How are changes verified?>
Communication
<How are service owners or consumers notified of these changes?>
- clones
-
RHIDP-10563 first pass of software template and gitops app definitions uses pre-built image
-
- Closed
-
- is cloned by
-
RHIDP-10740 implement gitops template using vLLM for the AI templates
-
- Closed
-