-
Feature
-
Resolution: Won't Do
-
Critical
-
rhelai-1.5
Feature Overview:
This Feature card is part of validating 3rd-party inference models in vllm inference flow for RHELAI 1.5. This is separate from the ilab model serve inference validation.
3rd-party model for this card: Mistral Small 3 24B-Instruct
Goals:
- Serve Mistral Small 3 24B with vllm in RHELAI 1.5 - functional test
- Chat with it to confirm it functions
- No errors/warnings arise
- Start documentation for MVP vllm inferencing on RHELAI 1.5
- Run for all quantized variants of the model (Base, INT4, INT8, FP8)
Out of Scope [To be updated post-refinement]:
- Ilab model serve functional testing, this is a separate endeavor
Requirements:
- Documentation to be updated to reflect the workaround for directly deploying vllm for inferencing on RHELAI
- Specifying the entrypoint command to run vllm when the container starts: ie. for Podman
- Specifying how models are downloaded and referenced in the command to be served
-
ENTRYPOINT=/opt/app-root/bin/vllm IMAGE=registry.redhat.io/rhelai1/instructlab-nvidia-rhel9:1.4-1738905416 podman run --rm -ti \ --device "nvidia.com/gpu=all" \ --security-opt "label=disable" \ --net host \ --shm-size 10G \ --pids-limit -1 \ -v $HOME:$HOME \ --entrypoint $ENTRYPOINT \ $IMAGE \ serve ~/models/a57d425d-80c2-4361-bbf7-23f1262ceea1 --served-model-name wcabanba0308sves-mlang-skill --host 127.0.0.1 --port 8000
- All base and quantized variants of the model are able to be served via this workaround
Done - Acceptance Criteria:
- QE ensures all functional requirements are met
Model Quantization Level Confirmed Mistral Small 3 24B-Instruct Baseline Mistral Small 3 24B-Instruct INT4 INT4 Mistral Small 3 24B-Instruct INT8 INT8 Mistral Small 3 24B-Instruct FP8 FP8
- Documentation is updated
- All base and quantized versions of the Model are confirmed to meet the requirements and all have a 'X' in the Confirmed boxes
Use Cases - i.e. User Experience & Workflow:
- User downloads the model via quay or huggingface
- User serves the model following the documentation to bypass other components of the RHELAI container and serves directly on vllm
Documentation Considerations:{}
- See requirements
Questions to answer:
- Which vllm version is in the 1.4 vs what is planned for 1.5?
Background & Strategic Fit:
Customers have been asking to leverage the latest and greatest third-party models from Meta, Mistral, Microsoft, Qwen, etc. within Red Hat AI Products. As our they continue to adopt and deploy OS models, the third-party model validation pipeline provides inference performance benchmarking and accuracy evaluations for third-party models to give customers confidence and predictability bringing third-party models to Instruct Lab and vLLM within RHEL AI and RHOAI.
See Red Hat AI Model Validation Strategy Doc
See Redhat Q1 2025 Third Party Model Validation Presentation
- clones
-
RHELAI-3622 Qwen-2.5 7B-Instruct RHELAI vllm inference flow
-
- Closed
-