-
Epic
-
Resolution: Unresolved
-
Critical
-
rhelai-1.5
-
[ilab] Running a third-party Llama 3.3 70B Instruct model as teacher model (+ inference functional testing) in ilab tuning flow
-
False
-
-
False
Feature Overview:
This Feature card is part of validating 3rd-party teacher models with the Instructlab component for RHELAI 1.5
3rd-party model for this card: Llama 3.3 70B Instruct
Goals:
- Run Llama 3.3 70B Instruct as a teacher model successfully through the Ilab tuning flow to tune student model Granite 3.1 8B Instruct (granite-3.1-8b-starter-v1)
- Create a fine-tuned student with Llama 3.3 70B Instruct as a teacher model
- Hand off the Model to PSAP Team for model validation - email/slack rh-ee-rogreenb when completed -so they can run OpenLLM Leaderboard v1/v2 evals between base model and fine-tuned model
- Run for all quantized variants of the model (Base, INT4, INT8, FP8) for the inference use case
Out of Scope [To be updated post-refinement]:
- Match the performance results with Mixtral 8x7B teacher set up
- Code changes that accommodates any arbitrary models
- Run dk-bench/Ragas evals to evaluate the fine-tuned student model on the newly learned data
- Model management (TBD)
Requirements:
- Functional Requirements:
- Ensure below components are functional with the 3rd party model as teacher for the finetuning usecase:
- Ilab model download is able to download the model from quay or huggingface
- Ilab model list to view the downloaded model
- Ilab data generate is able to generate usable synthetic data
- Ilab model train is able to tune Granite 3.1 8B Instruct (granite-3.1-8b-starter-v1)
- Ilab model serve can serve the fine-tuned student model
- Ensure below components are functional with the 3rd party model for all quantized variants in the inference use case:
- Ilab model download is able to download the model from quay or huggingface
- Ilab model list to view the downloaded model
- Ilab model serve can serve the model
- Ensure below components are functional with the 3rd party model as teacher for the finetuning usecase:
- Accuracy evaluation requirements:
- Handoff the base and fine-tuned model to the PSAP team - email/slack rh-ee-rogreenb when completed - to perform OpenLLM Leaderboard v1/v2 evaluation without math-hard subtask
Done - Acceptance Criteria:
- QE ensures finetuning functional requirements are met
- Base and finetuned model handover to PSAP Team for Model Validation
- Student model performance before tuning and after tuning on OpenLLM Leaderboard v1/v2 is comparable and there isn't a significant accuracy degradation +- 5 points - TBD
QE ensures inferencing functional requirements are met for each compression level [HF LINKS UPDATE]
Use Cases - i.e. User Experience & Workflow:
- User downloads 3rd party model from Redhat Registry/quay or HF via ilab model download command
Documentation Considerations:{}
- Update relevant documentation to expose new 3rd party model to users (ie. Chapter 3. Downloading Large Language models)
Questions to answer:
- What is the scope of code changes necessary to complete this for 1.5? Would this change if only 1 teacher and 1 student model is implemented, and not any arbitrary model?
- Does RHELAI allow models to be downloaded directly from HF?
- Should this feature rely on the target model already stored in Quay, or should we just perform model validation through Hugging Face stub flow instead?
- Should ilab model chat, ilab model upload (model management into S3) be in scope?
Background & Strategic Fit:
Customers have been asking to leverage the latest and greatest third-party models from Meta, Mistral, Microsoft, Qwen, etc. within Red Hat AI Products. As our they continue to adopt and deploy OS models, the third-party model validation pipeline provides inference performance benchmarking and accuracy evaluations for third-party models to give customers confidence and predictability bringing third-party models to Instruct Lab and vLLM within RHEL AI and RHOAI.
See Red Hat AI Model Validation Strategy Doc
See Redhat Q1 2025 Third Party Model Validation Presentation
- is blocked by
-
RHELAI-3616 Third-party model(s) support - for the end-to-end workflow and inference
-
- In Progress
-
- is cloned by
-
RHELAI-3595 [ilab] Qwen-2.5 7B-Instruct ilab model serve (inference functional testing)
-
- Testing
-
- is depended on by
-
RHELAI-3557 RHEL AI Third-Party Model Validation Deliverables for Summit '25
-
- In Progress
-
- mentioned on