Uploaded image for project: 'AI Platform Core Components'
  1. AI Platform Core Components
  2. AIPCC-491

Provide additional language packs for Tesseract

    • Additional tesseract-langpack
    • False
    • Hide

      None

      Show
      None
    • False
    • In Progress
    • AIPCC-1837 - multi-language support for vllm and instructlab
    • AIPCC-1837multi-language support for vllm and instructlab
    • 0% To Do, 0% In Progress, 100% Done

      InstructLab uses Docling to process and chunk documents. Docling depends on an OCR engine to convert images to text, e.g. in PDFs with embedded images. RHELAI uses the Tesseract OCR engine. The Tesseract RPM package is in RHEL 9.

      InstructLab Multilingual Model Support adds support for other languages like French, German, Italian, and Spanish. The Tesseract package in RHEL 9 only comes with tesseract-langpack-eng. The additional langpack RPMs are built but then excluded in Errata's product listing. See tesseract-tessdata erratum https://errata.devel.redhat.com/advisory/91911/builds

      Investigate how we can provide the required langpacks in our layered product:

      • Can we ship the langpack RPMs of build tesseract-tessdata-4.1.0-3.el9 in our layered product?
      • Do we need a new build of tesseract-tessdata?
      • Should we build latest version of Tesseract? RHEL 9 has tesseract-4.1.1 with leptonica-1.80. Latest versions in Fedora are tesseract-5.5.0 with leptonica-1.85. tessdata is on 4.1.0 everywhere. (not required at the moment)

      Goals:

      • Primary: Deliver language packs for at least French, German, Italian, and Spanish in RHELAI 1.5 application images. Packages must be available for installation before 2025-04-08 (RHELAI 1.5 RPM freeze date). RHELAI 1.5 will be based on RHEL 9.4 EUS.
      • Secondary: Agree on long-term plans for additional language packs for RHEL 9.6 and RHEL 10 (delivery, maintenance, QE work).

       

       

              cheimes@redhat.com Christian Heimes
              cheimes@redhat.com Christian Heimes
              Antonio's Team
              Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

                Created:
                Updated:
                Resolved: