Loading...

XML

Word

Printable

Type: Spike
Resolution: Unresolved
Priority: Undefined
Fix Version/s: None
Affects Version/s: None
Component/s: InstructLab - Training
Labels:
- 2.0-candidate

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
ilab-training sdk-ification
Intelligence Requested:
Market:

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Currently, there is no way to provide both a training and validation set to the training code. Ideally it would be possible to provide one of:

2 datasets (one for training, the other for validation)
1 dataset dict that contains a predefined train/val split
1 dataset and a percentage to randomly split the dataset into train and val

In addition, the user should be able to specify how frequently to evaluate the model on the validation dataset.

Then during the main training loop, the model's validation loss will be computed at the desired frequency and logged.

This is an essential component as it allows us to verify that the model is not overfitting to the training data, but has learned to generalize to unseen data as well.

blocks

RHELAI-4007 Automated train run benchmarking GitHub action

To Do

Assignee:: Fynn Schmitt-Ulms

Reporter:: Fynn Schmitt-Ulms

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/04/24 6:16 PM

Updated:: 2025/05/13 6:01 PM

Details

Description

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates