-
Feature
-
Resolution: Unresolved
-
Normal
-
None
-
False
-
-
False
-
Red
-
Feature Overview
InstructLab's multilingual model support enables users to generate datasets in multiple languages by translating documents from one language to another using an external translation service.
This feature will allow users to bring documents in a native language and translate them to English using an external translation service.
The resulting English version of the documents is then used in the SDG pipeline to generate the training data.
The resulting SDG dataset is duplicated, with one version in English and the other translated back to the original language using an external service or LLMBlock with a multilingual model. The combined dataset is then used as the training dataset in a multilingual student model.
Goals
- Enable users to generate training datasets from documents in non-English languages.
- Translate generated datasets back to the original language for use as training data
- Improve InstructLab's language support and accessibility
Requirements
- Integration with an external translation service or provide the ability to use LLMBlock with custom model
- Translation of documents from Spanish to English and vice versa
- Quality assurance of translated datasets
Background
Currently, InstructLab only supports the English language for data generation. Expanding support to include Spanish will allow users to generate datasets in their preferred language and use them for training a multilingual model with InstructLab.
Done
- [ ] Integration with an external translation service or LLMBlock with multilingual model is complete
- [ ] Documents can be translated from languages like Spanish/French/German/Italian to English and vice versa
- [ ] Quality assurance of translated datasets is implemented
Questions to Answer
- Which external translation service or model should be used for this feature?
- How will the quality of translated datasets be ensured?
- What are the performance implications of using an external translation service?
Out of Scope
- Translation of documents into languages other than the ones supported by Granite 3.1/3.2
- Implementation of advanced translation algorithms or models
Customer Considerations
- Ensure the chosen external translation service or model provides high-quality translations
- Evaluate the impact of translated datasets on the overall performance of InstructLab fine-tuned models
- clones
-
RHELAI-2424 [research] InstructLab Multilingual Model Support - Spanish
-
- Closed
-
- is related to
-
AIPCC-1837 multi-language support for vllm and instructlab
-
- Closed
-
- relates to
-
RHELAI-3611 Granite and InstructLab support German, French, Italian, Spanish and other languages
-
- In Progress
-