-
Outcome
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
Not Selected
-
False
-
(draft)
Description
Users want to have early feedback on
- How good a document is for fine-tuning a model for knowledge with evaluations or verifications like:
- A threshold of the minimum number of tokens required for proper fine-tuning of a model
- A histogram distribution of document elements (e.g. tables, versus paragraphs, versus images) with guidance on ideal ratio
- A histogram distribution of the size of SDG samples, and the number of SDG samples per domain or leaf with frequent updates during SDG generation.
- This will help in use cases looking to identify if a particular behavior or number of samples for a domain are not enough so the user can stop the SDG earlier.