Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-3459

InstructLab Multilingual Model Support (w/translations)

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • Red
    • Hide

      Jan 16 (Aileen) Setting this as Off Track as there are no child issues, no link to GH and still in the Backlog state.

      Show
      Jan 16 (Aileen) Setting this as Off Track as there are no child issues, no link to GH and still in the Backlog state.

      Feature Overview

      InstructLab's multilingual model support enables users to generate datasets in multiple languages by translating documents from one language to another using an external translation service.

      This feature will allow users to bring documents in a native language and translate them to English using an external translation service.

      The resulting English version of the documents is then used in the SDG pipeline to generate the training data.

      The resulting SDG dataset is duplicated, with one version in English and the other translated back to the original language using an external service or LLMBlock with a multilingual model. The combined dataset is then used as the training dataset in a multilingual student model.

      Goals

      • Enable users to generate training datasets from documents in non-English languages.
      • Translate generated datasets back to the original language for use as training data
      • Improve InstructLab's language support and accessibility

      Requirements

      • Integration with an external translation service or provide the ability to use LLMBlock with custom model
      • Translation of documents from Spanish to English and vice versa
      • Quality assurance of translated datasets

      Background

      Currently, InstructLab only supports the English language for data generation. Expanding support to include Spanish will allow users to generate datasets in their preferred language and use them for training a multilingual model with InstructLab.

      Done

      • [ ] Integration with an external translation service or LLMBlock with multilingual model is complete
      • [ ] Documents can be translated from languages like Spanish/French/German/Italian to English and vice versa
      • [ ] Quality assurance of translated datasets is implemented

      Questions to Answer

      • Which external translation service or model should be used for this feature?
      • How will the quality of translated datasets be ensured?
      • What are the performance implications of using an external translation service?

      Out of Scope

      • Translation of documents into languages other than the ones supported by Granite 3.1/3.2
      • Implementation of advanced translation algorithms or models

      Customer Considerations

      • Ensure the chosen external translation service or model provides high-quality translations
      • Evaluate the impact of translated datasets on the overall performance of InstructLab fine-tuned models

              wcabanba@redhat.com William Caban
              wcabanba@redhat.com William Caban
              Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

                Created:
                Updated: