-
Task
-
Resolution: Unresolved
-
Normal
-
None
-
rhelai-1.5
-
None
-
False
-
-
False
-
Not Selected
Feature Overview
Provide the major differences and changes between RHEL AI 1.4 and 1.5.
Documentation Considerations _{}(Initial completion while in Refinement status):{_}
RHEL AI 1.4 → 1.5 Changelog
Cloud Team Engineering Summary for Red Hat Review
—
Model Upgrade
- From: granite-3.1-8b-starter
- To: granite-3.1-8b-starter-v2.1
- Model Size: Increased from 16GB → 32GB
- Safesensors Files: 4 → 7
—
Replay Buffer
- Size: Increased from 1.6GB → 6.9GB
- Content: +13% more samples, 2x more tokens
- Purpose: To include more long-context samples
- Impact:
- Training token volume increased by +88% to +114%
- Training time increased by +65% to +84%
Replay Buffer Filtering
- Alternative: Filter samples >10k tokens to reduce training time and cost
- Tool: [Replay Buffer Filter](https://github.com/relyt0925/replay-buffer-filter/blob/main/README.md)
- RH Position: Will back filtered buffer as default
- Status: Recommended for production use
—
Training Parameters
| Parameter | 1.4 | 1.5 | Notes |
| ---------------------------------- | --------------- | --------------- | -------------------------------------------- |
| phased-phase1-num-epochs | 7 | 7 | Unchanged |
| phased-phase2-num-epochs | 10 | 7 | Lowered |
| max-batch-len | 30k | 45k | Allows more tokens per step |
| max-seq-len | 4,096 | 42,000 | Supports long-context samples |
| is_padding_free | true | false | Changed |
| use_dolomite | true | false | Changed |
| phased-phase1-effective-batch-size | Present | Removed | Deprecated |
| phased-phase2-effective-batch-size | Present | Removed | Deprecated |
| phased_phase1_learning_rate | 2e-5 | 6e-6 | Lowered |
| phased_phase2_learning_rate | 6e-6 | 2e-5 | Increased |
—
SDG Parameters
| Parameter | 1.4 | 1.5 |
| --------------- | ||
| batch-size | 256 | 32 |
| num_cpus | 10 | 4 |
Team Sign Off (Completion while in Planning status)
| Reviewed By | Team Name | Accepted | Notes |
| William Caban | |||