Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-3209

(draft)[eval] Evaluation of "quality" of data source and sdg (before/during/after run)

XMLWordPrintable

    • Not Selected
    • False
    • Hide

      None

      Show
      None

      (draft)

      Description

      Users want to have early feedback on 

      • How good a document is for fine-tuning a model for knowledge with evaluations or verifications like:
        • A threshold of the minimum number of tokens required for proper fine-tuning of a model
        • A histogram distribution of document elements (e.g. tables, versus paragraphs, versus images) with guidance on ideal ratio
        • A histogram distribution of the size of SDG samples, and the number of SDG samples per domain or leaf with frequent updates during SDG generation.
          • This will help in use cases looking to identify if a particular behavior or number of samples for a domain are not enough so the user can stop the SDG earlier.

       

              wcabanba@redhat.com William Caban
              wcabanba@redhat.com William Caban
              Jehlum Vitasta Pandit
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: