Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-3611

Granite and InstructLab support German, French, Italian, Spanish and other languages

XMLWordPrintable

    • Icon: Feature Feature
    • Resolution: Unresolved
    • Icon: Major Major
    • rhelai-1.5
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected

      Feature Overview

      Customers want ability to customize Granite models in non-English languages. Granite 3.1 base models have support for German, French, Italian, Spanish and Portuguese. So this feature brings capabilities for users to customize and fine-tune the Granite 3.1 model in German, French, Italian, Spanish and Portuguese.

       

      Supporting multilingual SDG has dependencies on the teacher model. 

      With the current Mixtral 7x8b teacher model the focus are:

      • Spanish
      • French
      • Italian
      • Geman

      With 3rd-party teacher models (e.g. LLama 3.3 70B, Qwen 2.5 72B, etc.)

      • Japanese
      • Portuguese
      • Polish

       

      Goals

      Customers who have datasets in German, French, Italian, Spanish and Portuguese and want to customize Granite models in these languages with IL. 

       

      Requirements

      • For each language, I should be able to use PDF (and other supported formats with RHEL AI) documents in native language, and qna.yaml (data prep) in native language to create sdg (ilab generate), train (ilab train) and evaluate (ilab evaluate)
      • The tesseract OCR langpack for the multilingual language must be supported

       

      Done
      Acceptance Criteria articulates and defines the value proposition - what is required to meet the goal and intent of this Feature. The Acceptance Criteria provides a detailed definition of scope and the expected outcomes - from a users point of view

      • Ingestion, chunking (hybrid) and input data for SDG should be in native language. 
      • Results of LLM-as-judge evaluation of the customized student model in native language should be within 10% for corresponding scores with English language for documents in similar category and size. 

       

      Use Cases - i.e. User Experience & Workflow: (Initial completion while in Refinement status):
      Include use case diagrams, main success scenarios, alternative flow scenarios.
      <your text here>

      Out of Scope {}{}(Initial completion while in Refinement status):
      High-level list of items or persona’s that are out of scope.
      <your text here>

      Documentation Considerations {}{}(Initial completion while in Refinement status):
      Provide information that needs to be considered and planned so that documentation will meet customer needs. If the feature extends existing functionality, provide a link to its current documentation..
      <your text here>

       

      Questions to Answer {}{}(Initial completion while in Refinement status):
      Include a list of refinement / architectural questions that may need to be answered before coding can begin.
      <your text here>

      Background and Strategic Fit (Initial completion while in Refinement status):
      Provide any additional context is needed to frame the feature.
      <your text here>

      Customer Considerations {}{}(Initial completion while in Refinement status):
      Provide any additional customer-specific considerations that must be made when designing and delivering the Feature.
      <your text here>

              wcabanba@redhat.com William Caban
              tkatarki@redhat.com Tushar Katarki
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated: