-
Feature
-
Resolution: Unresolved
-
Major
-
None
-
False
-
-
False
-
Not Selected
-
100% To Do, 0% In Progress, 0% Done
-
Feature Overview:
Support Granite 3.0 model with 128k context window with a new 8b ~64k dataset.
Note: A 128k context window on an 8B model can adversely affect performance benchmarks, hence the initial focus on ~64K effective context window.
Goal:
- Fine-tune the 128k model using a new 8b ~64k dataset
- Note: This new dataset requires clearance from legal
- Identify the effective context window to be used as the new supported context window
- The expectation is at least a 64k context window
Requirements:
- Fine-tune the 128k model with a new 8b ~64k dataset
- Validate and document effective context window
- Identify any deviation in the performance of the final model
- To move as GA, it should be within the margin of error
- Identify optimal batch size for training
See additional notes in the comments on RHELAI -946 outcome card.
Done - Acceptance Criteria:
- [ ] InstructLab can fine-tune the 128k model using a new 8b ~64k dataset
- [ ] Document the effective context window of the resulting model
- [ ] Evaluate and compare the performance of the final model to a 4k fine-tuned model
- [ ] Document and use optimal batch size during training (if required)
Use Cases:
Enterprise use cases that would benefit from a large context window include the following:
- RAG
- Summarization
- Code generation
- Tools use
- Advanced reasoning
Out of Scope :
For phase 2, creating an 8b 128k dataset or SDG optimizations for 128k context windows is out of scope.
Documentation Considerations:
Document the support of large context window limits based on effective context window size.
Questions to Answer:
- What are the optimal hyperparameters for fine-tuning a model for the ~64k context window
- Can we default to the 128k model, or are there circumstances in which we should default to previous student models?
Background and Strategic Fit:
To support the enterprise use cases required by customers, we need at least a 64k effective context window.
Customer Considerations:
- These changes should be transparent to the user-facing CLI flow
- clones
-
RHELAI-2669 (phase 1) Productize the 128k context window Granite v3.1
- New
- is cloned by
-
RHELAI-2671 (phase 3) Productize the 128k context window Granite v3.0 (SDG & Training)
- New