Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-2867

[instructlab/instructlab] Investigate work needed to support subset selection

XMLWordPrintable

      [2771326597] Upstream Reporter: Kim
      Upstream issue status: Open
      Upstream description:

      Feature Overview:

      • When users generate a lot of samples, they will have the option to run subset selection method to get a minimal set of samples representative of original dataset.
      • Subset Selection algorithm as developed by research computes embeddings of the samples and then tries to iteratively find a minimal subset which maximizes the coverage of the dataset.

      Goals (mandatory - Complete while in New status)

      • Provide high-level goal statement, providing user context and expected user outcome(s) for this Feature
      • End user with a large dataset, which might require higher compute, training time, other resources can reduce the size of the input dataset

      Requirement:

      • Given an input dataset, subset selection outputs a smaller set representative of the original dataset.

      Done - Acceptance Criteria:

      • Output of subset selection is representative of original dataset
      • Smoke test to verify few use cases with subset selection (model is being trained efficiently)

      Upstream URL: https://github.com/instructlab/instructlab/issues/2857

              jrao@redhat.com Jaideep Rao
              upstream-sync Upstream Sync
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: