Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-2670

(phase 2) Productize the 128k context window Granite v3.1

XMLWordPrintable

    • False
    • Hide

      None

      Show
      None
    • False
    • Not Selected
    • 100% To Do, 0% In Progress, 0% Done
    • Hide

      Targetting at least TP for RHEL AI 1.4 is conditional on the availability of the 128k base models.

      Show
      Targetting at least TP for RHEL AI 1.4 is conditional on the availability of the 128k base models.

      Feature Overview:

      Support Granite 3.0 model with 128k context window with a new 8b ~64k dataset.

      Note: A 128k context window on an 8B model can adversely affect performance benchmarks, hence the initial focus on ~64K effective context window.

       

      Goal:

      • Fine-tune the 128k model using a new 8b ~64k dataset
        • Note: This new dataset requires clearance from legal
      • Identify the effective context window to be used as the new supported context window
        • The expectation is at least a 64k context window

       

      Requirements:

      • Fine-tune the 128k model with a new 8b ~64k dataset
      • Validate and document effective context window
      • Identify any deviation in the performance of the final model
        • To move as GA, it should be within the margin of error 
      • Identify optimal batch size for training 

       

      See additional notes in the comments on RHELAI -946 outcome card.
       
      Done - Acceptance Criteria:

       

      • [ ] InstructLab can fine-tune the 128k model using a new 8b ~64k dataset
      • [ ] Document the effective context window of the resulting model
      • [ ] Evaluate and compare the performance of the final model to a 4k fine-tuned model
      • [ ] Document and use optimal batch size during training (if required)

       

      Use Cases:

      Enterprise use cases that would benefit from a large context window include the following:

      • RAG
      • Summarization
      • Code generation
      • Tools use
      • Advanced reasoning

      Out of Scope :

      For phase 2, creating an 8b 128k dataset or SDG optimizations for 128k context windows is out of scope.

       

      Documentation Considerations:

      Document the support of large context window limits based on effective context window size.

       

      Questions to Answer:

      • What are the optimal hyperparameters for fine-tuning a model for the ~64k context window
      • Can we default to the 128k model, or are there circumstances in which we should default to previous student models?

       

      Background and Strategic Fit:

      To support the enterprise use cases required by customers, we need at least a 64k effective context window.

       

      Customer Considerations:

      • These changes should be transparent to the user-facing CLI flow

              wcabanba@redhat.com William Caban
              wcabanba@redhat.com William Caban
              Mustafa Eyceoz, Oleg Silkin
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: