Loading...

XML

Word

Printable

Type: Spike
Resolution: Unresolved
Priority: Undefined
Fix Version/s: rhelai-1.5
Affects Version/s: None
Component/s: InstructLab - Core
Labels:
- github

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
Research and Investigation
Feature Link:
RHELAI-2670 - (phase 2) Productize the 128k context window Granite v3.1
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

[2771357714] Upstream Reporter: Kim
Upstream issue status: Open
Upstream description:

Dependent on Model Prod and SDG investigation

Feature Overview:

Support Granite 3.1 model with 128k context window with a new 8b ~64k dataset.

Note: A 128k context window on an 8B model can adversely affect performance benchmarks, hence the initial focus on ~64K effective context window.

Goal:

Fine-tune the 128k model using a new 8b ~64k dataset

Note: This new dataset requires clearance from legal

Identify the effective context window to be used as the new supported context window

The expectation is at least a 64k context window

Requirements:

Fine-tune the 128k model with a new 8b ~64k dataset

Validate and document effective context window

Identify any deviation in the performance of the final model

To move as GA, it should be within the margin of error

Identify optimal batch size for training

Done - Acceptance Criteria:

[ ] InstructLab can fine-tune the 128k model using a new 8b ~64k dataset [ ] Document the effective context window of the resulting model [ ] Evaluate and compare the performance of the final model to a 4k fine-tuned model [ ] Document and use optimal batch size during training (if required)

Use Cases:

Enterprise use cases that would benefit from a large context window include the following:

RAG

Summarization

Code generation

Tools use

Advanced reasoning

Out of Scope :

For phase 2, creating an 8b 128k dataset or SDG optimizations for 128k context windows is out of scope.

Questions to Answer:

What are the optimal hyperparameters for fine-tuning a model for the ~64k context window

Can we default to the 128k model, or are there circumstances in which we should default to previous student models?

Upstream URL: https://github.com/instructlab/instructlab/issues/2858

is blocked by

RHELAI-2922 [Research] Investigate max number of tokens

links to

Upstream issue

Assignee:: Unassigned

Reporter:: Upstream Sync

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/01/06 8:06 PM

Updated:: 2025/01/09 12:48 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates