Loading...

XML

Word

Printable

Type: Feature
Resolution: Unresolved
Priority: Major
Fix Version/s: RHELAI Backlog
Affects Version/s: None
Component/s: InstructLab - CLI, Instructlab - Research, InstructLab - SDG, InstructLab - Training
Labels:
- 1.Next-candidate

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Color Status:
Not Selected
Hierarchy Progress Bar:

100% To Do, 0% In Progress, 0% Done
Status Summary:

Hide

Targetting at least TP for RHEL AI 1.4 is conditional on the availability of the 128k base models.

Show
Targetting at least TP for RHEL AI 1.4 is conditional on the availability of the 128k base models.

Feature Overview:

Support SDG training dataset generation for long context window Granite 3.0 128k model.

Goal:

[SDG team] Extend SDG to support the generation of datasets for training models with long context window.
- Note: The teacher model must support the desired context window
[research or models team]
- Identify hyperparameter optimizations required to handle the new SDG dataset
- Identify the effective context window to be used as the new supported context window
  - The expectation is at least a 64k context window

Requirements:

SDG supports the generation of datasets to train models with long context window
Ability to fine-tune with the new SDG dataset
Validate and document effective context window
Identify any deviation in the performance of the final model
- To move as GA, it should be within the margin of error
Identify optimal batch size and other hyperparameters for training with the new SDG datset

See additional notes in the comments on RHELAI -946 outcome card.

Done - Acceptance Criteria:

[ ] InstructLab SDG can generate training datasets for models with large context windows
[ ] InstructLab can fine-tune the 128k model using the new SDG dataset
[ ] Document the effective context window of the resulting model
[ ] Evaluate and compare the performance of the final model to a 4k fine-tuned model
[ ] Document and use optimal batch size and hyperparameters during training

Use Cases:

Enterprise use cases that would benefit from a large context window include the following:

Out of Scope :

The ability to achieve a 128k effective context window in an 8b model is out of scope for this card.

Documentation Considerations:

Document the support of large context window limits based on effective context window size.

Questions to Answer:

What are the optimal hyperparameters for fine-tuning a model for the ~64k context window
Can we default to the 128k base model, or are there circumstances in which we should default to previous student models?

Background and Strategic Fit:

To support the enterprise use cases required by customers, we need at least a 64k effective context window.

Customer Considerations:

These changes might require changes to the user-facing CLI flow. We should minimize the requirements to modify existing default CLI flow and use optional configuration or flags to expose new capabilities.

clones

RHELAI-2670 (phase 2) Productize the 128k context window Granite v3.1

Contributors:: Aakanksha Duggal, Aditi Saluja, Ben Browning, Charles Doern, Jehlum Vitasta Pandit, Mustafa Eyceoz, Oleg Silkin