Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-4149

Investigate Meta’s Synthetic Data Kit

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Undefined Undefined
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False

      Description:
      Conduct an investigation into Meta’s Synthetic Data Kit to understand its capabilities, configuration, and data generation process. The goal is to run through one of our previously run PoCs and compare in terms of setup, workflow, inputs/outputs, and extensibility.

      Tasks:

      • Review Meta’s Synthetic Data Kit documentation and example usage (public repos, blog posts, etc.)
      • Identify core components and required inputs for data generation
      • Reproduce or run through at least one example using Meta’s SDK
      • Compare and contrast the SDK’s pipeline with one of our previously run PoCs:
        • Data input structure
        • Annotation format
        • Configuration knobs
        • Output format and quality
        • Integration points and modularity
      • Identify areas where Meta’s SDK overlaps or diverges from our approach
      • Summarize findings and recommendations in a short write-up or slide deck

      Acceptance Criteria:

      • Documented comparison between Meta’s SDK and our PreProcessing workflow/SDG
      • Clear notes on configuration differences, pros/cons, and integration considerations
      • Suggestions for potential alignment, reuse, or divergence paths

              aliryan Alina Ryan
              aliryan Alina Ryan
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: