Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-2424

[research] InstructLab Multilingual Model Support - Spanish

XMLWordPrintable

    • Icon: Feature Feature
    • Resolution: Done
    • Icon: Normal Normal
    • rhelai-1.4
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • Red
    • Hide

      Jan 16 (Aileen) Setting this as Off Track as there are no child issues, no link to GH and still in the Backlog state.

      Show
      Jan 16 (Aileen) Setting this as Off Track as there are no child issues, no link to GH and still in the Backlog state.

      Feature Overview

      InstructLab's multilingual model support enables users to generate datasets in multiple languages by translating documents from one language to another using an external translation service. This feature will allow users to bring documents in Spanish and translate them to English using an external translation service. The resulting English version of the documents is then used in the SDG pipeline to generate the training data. The resulting SDG dataset can then be translated back to Spanish using an external service and used as the training dataset in a multilingual model.

      Goals

      • Enable users to generate training datasets from documents in Spanish
      • Translate generated datasets back to Spanish for use as training data
      • Improve InstructLab's language support and accessibility

      Requirements

      • Integration with an external translation service or model
      • Translation of documents from Spanish to English and vice versa
      • Quality assurance of translated datasets

      Background

      Currently, InstructLab only supports the English language for data generation. Expanding support to include Spanish will allow users to generate datasets in their preferred language and use them for training a multilingual model with InstructLab.

      Done

      • [ ] Integration with an external translation service or model is complete
      • [ ] Documents can be translated from Spanish to English and vice versa
      • [ ] Quality assurance of translated datasets is implemented

      Questions to Answer

      • Which external translation service or model should be used for this feature?
      • How will the quality of translated datasets be ensured?
      • What are the performance implications of using an external translation service?

      Out of Scope

      • Translation of documents into other languages than English and Spanish
      • Implementation of advanced translation algorithms or models

      Customer Considerations

      • Ensure the chosen external translation service or model provides high-quality translations
      • Evaluate the impact of translated datasets on the overall performance of InstructLab fine-tuned models

              wcabanba@redhat.com William Caban
              wcabanba@redhat.com William Caban
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: