-
Feature
-
Resolution: Unresolved
-
Major
-
None
-
rhelai-3.0
Feature Overview:
This Feature card is part of validating 3rd-party teacher models with the Instructlab component for RHELAI 1.5
3rd-party model for this card: Llama 3.1 8B
Goals:
- Run Llama 3.1 8B as a student model successfully in the Instructlab tuning flow, tuned by current teacher, Mixtral 8x7B Instruct (mixtral-8x7b-instruct-v0-1)
- Create a fine-tuned Llama 3.1 8B student
- Hand off the Model to PSAP Team for model validation - email/slack rh-ee-rogreenb when completed -so they can run OpenLLM Leaderboard v1/v2 evals between base model and fine-tuned model
- Run for all quantized variants of the model (Base, INT4, INT8, FP8) for the inference use case
Out of Scope [To be updated post-refinement]:
- Match performance results with current student, Granite 3.1 8B Instruct
- Code changes that accommodates any arbitrary models
- Model management functions (ie. ilab model upload)
- Run dk-bench/Ragas evals to evaluate the fine-tuned student model on the newly learned data
Requirements:
- Functional Requirements:
- Ensure below components of the flow are functional with the 3rd party student model:
- Ilab model download is able to download the model from quay/huggingface
- Ilab model list to view the downloaded model
- Ilab model train is able to tune Llama 3.1 8B
- Ilab model serve can serve the fine-tuned Llama 3.1 8B model
- Ensure below components are functional with the 3rd party model for all quantized variants in the inference use case:
- Ilab model download is able to download the model from quay or huggingface
- Ilab model list to view the downloaded model
- Ilab model serve can serve the model
- Ensure below components of the flow are functional with the 3rd party student model:
- Accuracy evaluation requirements:
- Handoff the base and fine-tuned model to the PSAP team - email/slack rh-ee-rogreenb when completed - to perform OpenLLM Leaderboard v1/v2 evaluation without math-hard subtask
Done - Acceptance Criteria:
- QE ensures all functional requirements are met
- Base and finetuned model handover to PSAP
- Student model performance before tuning and after tuning on OpenLLM Leaderboard v1/v2 is comparable and there isn't a significant accuracy degradation +- 5 points - BD
- QE ensures inferencing functional requirements are met for each compression level [HF LINKS UPDATE]
Model | Quantization Level | Confirmed |
---|---|---|
Llama 3.1 8B | Baseline | |
Llama 3.1 8B INT4 | INT4 | |
Llama 3.1 8B INT8 | INT8 | |
Llama 3.1 8B FP8 | FP8 |
Use Cases - i.e. User Experience & Workflow:
- User downloads 3rd party model from Redhat Registry/quay or HF via ilab model download command
Documentation Considerations:
- Update relevant documentation to expose new third party model to users (ie. Chapter 3. Downloading Large Language models)
Questions to answer:
- https://issues.redhat.com/browse/RHELAI-3559 - Refer to open questions here.
- Do we need to run all of the quantized versions of the same models through the ilab serve validation step or if the baseline works, we can assume the quantized will work?
Background & Strategic Fit:
Customers have been asking to leverage the latest and greatest third-party models from Meta, Mistral, Microsoft, Qwen, etc. within Red Hat AI Products. As our they continue to adopt and deploy OS models, the third-party model validation pipeline provides inference performance benchmarking and accuracy evaluations for third-party models to give customers confidence and predictability bringing third-party models to Instruct Lab and vLLM within RHEL AI and RHOAI.
See Red Hat AI Model Validation Strategy Doc
See Redhat Q1 2025 Third Party Model Validation Presentation
- is blocked by
-
RHELAI-3616 Third-party model(s) support - for the end-to-end workflow and inference
-
- In Progress
-