-
Feature
-
Resolution: Unresolved
-
Critical
-
rhelai-1.5
Feature Overview:
This Feature card is part of validating 3rd-party inference models with ilab serve with the Instructlab component for RHELAI 1.5
3rd-party model for this card: Llama 3.1 8B
Goals:
- Run llama 3.1 8B with the ilab serve command in InstructLab - functional test
- No errors/warnings arise
- Run for all quantized variants of the model
Out of Scope [To be updated post-refinement]:
- Evaluating the performance of the chat
- Evaluating Accuracy
- That vLLM works w/ this model - this will be confirmed before this testing happens
Requirements:
- Ensure the tests are run with rh-vllm (previously nm-vllm-ent) as the engine
- Functional Requirements:
- Ensure below components of the flow are functional with the 3rd party model for the inference use case:
- Ilab model download is able to download the model from quay or huggingface
- Ilab model list to view the downloaded model
- Ilab model serve can serve the model
- Ensure below components of the flow are functional with the 3rd party model for the inference use case:
Done - Acceptance Criteria:
- Ensures all functional requirements are met
- All Quantized versions of the Model are confirmed to meet the requirements and all have a 'X' in the Confirmed boxes
Model Quantization Level Confirmed Llama 3.1 8B Baseline Llama 3.1 8B INT4 INT4 Llama 3.1 8B INT8 INT8 Llama 3.1 8B FP8 FP8
Use Cases - i.e. User Experience & Workflow:
- User downloads 3rd party model from Redhat Registry/quay or HF via ilab model download command
- User can then view the model with Ilab model list
- User can then serve the model for inference on vLLM-ent with Ilab model serve
Documentation Considerations:{}
- Update relevant documentation to expose new 3rd party model to users (ie. Chapter 3. Downloading Large Language models)
Questions to answer:
- Are there any code changes to serve a model? Note, the model will already be validated in vllm-ent BEFORE this testing occurs so this is just ensuring the ilab commands work, NOT that vLLM works w/ this model.
Background & Strategic Fit:
Customers have been asking to leverage the latest and greatest third-party models from Meta, Mistral, Microsoft, Qwen, etc. within Red Hat AI Products. As our they continue to adopt and deploy OS models, the third-party model validation pipeline provides inference performance benchmarking and accuracy evaluations for third-party models to give customers confidence and predictability bringing third-party models to Instruct Lab and vLLM within RHEL AI and RHOAI.
See Red Hat AI Model Validation Strategy Doc
See Redhat Q1 2025 Third Party Model Validation Presentation
- clones
-
RHELAI-3595 [ilab] Qwen-2.5 7B-Instruct ilab model serve (inference functional testing)
-
- Testing
-