-
Feature
-
Resolution: Done
-
Normal
-
None
-
None
-
False
-
-
False
-
Red
-
Feature Overview
InstructLab's multilingual model support enables users to generate datasets in multiple languages by translating documents from one language to another using an external translation service. This feature will allow users to bring documents in Spanish and translate them to English using an external translation service. The resulting English version of the documents is then used in the SDG pipeline to generate the training data. The resulting SDG dataset can then be translated back to Spanish using an external service and used as the training dataset in a multilingual model.
Goals
- Enable users to generate training datasets from documents in Spanish
- Translate generated datasets back to Spanish for use as training data
- Improve InstructLab's language support and accessibility
Requirements
- Integration with an external translation service or model
- Translation of documents from Spanish to English and vice versa
- Quality assurance of translated datasets
Background
Currently, InstructLab only supports the English language for data generation. Expanding support to include Spanish will allow users to generate datasets in their preferred language and use them for training a multilingual model with InstructLab.
Done
- [ ] Integration with an external translation service or model is complete
- [ ] Documents can be translated from Spanish to English and vice versa
- [ ] Quality assurance of translated datasets is implemented
Questions to Answer
- Which external translation service or model should be used for this feature?
- How will the quality of translated datasets be ensured?
- What are the performance implications of using an external translation service?
Out of Scope
- Translation of documents into other languages than English and Spanish
- Implementation of advanced translation algorithms or models
Customer Considerations
- Ensure the chosen external translation service or model provides high-quality translations
- Evaluate the impact of translated datasets on the overall performance of InstructLab fine-tuned models
- is cloned by
-
RHELAI-3459 InstructLab Multilingual Model Support (w/translations)
-
- New
-