-
Feature
-
Resolution: Unresolved
-
Major
-
None
-
False
-
-
False
-
Not Selected
-
100% To Do, 0% In Progress, 0% Done
-
Feature Overview:
Support SDG training dataset generation for long context window Granite 3.0 128k model.
Goal:
- [SDG team] Extend SDG to support the generation of datasets for training models with long context window.
- Note: The teacher model must support the desired context window
- [research or models team]
- Identify hyperparameter optimizations required to handle the new SDG dataset
- Identify the effective context window to be used as the new supported context window
- The expectation is at least a 64k context window
Requirements:
- SDG supports the generation of datasets to train models with long context window
- Ability to fine-tune with the new SDG dataset
- Validate and document effective context window
- Identify any deviation in the performance of the final model
- To move as GA, it should be within the margin of error
- Identify optimal batch size and other hyperparameters for training with the new SDG datset
See additional notes in the comments on RHELAI -946 outcome card.
Done - Acceptance Criteria:
- [ ] InstructLab SDG can generate training datasets for models with large context windows
- [ ] InstructLab can fine-tune the 128k model using the new SDG dataset
- [ ] Document the effective context window of the resulting model
- [ ] Evaluate and compare the performance of the final model to a 4k fine-tuned model
- [ ] Document and use optimal batch size and hyperparameters during training
Use Cases:
Enterprise use cases that would benefit from a large context window include the following:
- RAG
- Summarization
- Code generation
- Tools use
- Advanced reasoning
Out of Scope :
The ability to achieve a 128k effective context window in an 8b model is out of scope for this card.
Documentation Considerations:
Document the support of large context window limits based on effective context window size.
Questions to Answer:
- What are the optimal hyperparameters for fine-tuning a model for the ~64k context window
- Can we default to the 128k base model, or are there circumstances in which we should default to previous student models?
Background and Strategic Fit:
To support the enterprise use cases required by customers, we need at least a 64k effective context window.
Customer Considerations:
- These changes might require changes to the user-facing CLI flow. We should minimize the requirements to modify existing default CLI flow and use optional configuration or flags to expose new capabilities.
- clones
-
RHELAI-2670 (phase 2) Productize the 128k context window Granite v3.1
- New