Loading...

XML

Word

Printable

(draft)

Description

Users want to have early feedback on

How good a document is for fine-tuning a model for knowledge with evaluations or verifications like:
- A threshold of the minimum number of tokens required for proper fine-tuning of a model
- A histogram distribution of document elements (e.g. tables, versus paragraphs, versus images) with guidance on ideal ratio
- A histogram distribution of the size of SDG samples, and the number of SDG samples per domain or leaf with frequent updates during SDG generation.
  - This will help in use cases looking to identify if a particular behavior or number of samples for a domain are not enough so the user can stop the SDG earlier.