Loading...

XML

Word

Printable

Type: Feature
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: rhelai-3.0
Component/s: InstructLab - Core, Instructlab - Research, InstructLab - Training
Labels:
- 2.0-candidate
- model-validation

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Parent Link:
RHELAI-3608RHEL AI Third-Party Model Validation Program [Post-Summit]

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Intelligence Requested:
Market:

Feature Overview:
This Feature card is part of validating 3rd-party teacher models with the Instructlab component for RHELAI 1.5

3rd-party model for this card: Llama 3.1 8B

Goals:

Run Llama 3.1 8B as a student model successfully in the Instructlab tuning flow, tuned by current teacher, Mixtral 8x7B Instruct (mixtral-8x7b-instruct-v0-1)
Create a fine-tuned Llama 3.1 8B student
Hand off the Model to PSAP Team for model validation - email/slack rh-ee-rogreenb when completed -so they can run OpenLLM Leaderboard v1/v2 evals between base model and fine-tuned model
Run for all quantized variants of the model (Base, INT4, INT8, FP8) for the inference use case

Out of Scope [To be updated post-refinement]:

Match performance results with current student, Granite 3.1 8B Instruct
Code changes that accommodates any arbitrary models
Model management functions (ie. ilab model upload)
Run dk-bench/Ragas evals to evaluate the fine-tuned student model on the newly learned data

Requirements:

Functional Requirements:
- Ensure below components of the flow are functional with the 3rd party student model:
  - Ilab model download is able to download the model from quay/huggingface
  - Ilab model list to view the downloaded model
  - Ilab model train is able to tune Llama 3.1 8B
  - Ilab model serve can serve the fine-tuned Llama 3.1 8B model
- Ensure below components are functional with the 3rd party model for all quantized variants in the inference use case:
  - Ilab model download is able to download the model from quay or huggingface
  - Ilab model list to view the downloaded model
  - Ilab model serve can serve the model

Accuracy evaluation requirements:
- Handoff the base and fine-tuned model to the PSAP team - email/slack rh-ee-rogreenb when completed - to perform OpenLLM Leaderboard v1/v2 evaluation without math-hard subtask

Done - Acceptance Criteria:

QE ensures all functional requirements are met
Base and finetuned model handover to PSAP
Student model performance before tuning and after tuning on OpenLLM Leaderboard v1/v2 is comparable and there isn't a significant accuracy degradation +- 5 points - BD
QE ensures inferencing functional requirements are met for each compression level [HF LINKS UPDATE]

Model	Quantization Level	Confirmed
Llama 3.1 8B	Baseline
Llama 3.1 8B INT4	INT4
Llama 3.1 8B INT8	INT8
Llama 3.1 8B FP8	FP8

Use Cases - i.e. User Experience & Workflow:

User downloads 3rd party model from Redhat Registry/quay or HF via ilab model download command

Documentation Considerations:

Update relevant documentation to expose new third party model to users (ie. Chapter 3. Downloading Large Language models)

Questions to answer:

https://issues.redhat.com/browse/RHELAI-3559 - Refer to open questions here.
Do we need to run all of the quantized versions of the same models through the ilab serve validation step or if the baseline works, we can assume the quantized will work?

Background & Strategic Fit:

Customers have been asking to leverage the latest and greatest third-party models from Meta, Mistral, Microsoft, Qwen, etc. within Red Hat AI Products. As our they continue to adopt and deploy OS models, the third-party model validation pipeline provides inference performance benchmarking and accuracy evaluations for third-party models to give customers confidence and predictability bringing third-party models to Instruct Lab and vLLM within RHEL AI and RHOAI.

See Red Hat AI Model Validation Strategy Doc

See Redhat Q1 2025 Third Party Model Validation Presentation

is blocked by

RHELAI-3616 Third-party model(s) support - for the end-to-end workflow and inference

In Progress

Assignee:: Rob Greenberg

Reporter:: Rob Greenberg

Contributors:: Jenny Yi

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/03/06 7:55 PM

Updated:: 2025/03/12 6:02 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates