Uploaded image for project: 'Red Hat Enterprise Linux AI'
  1. Red Hat Enterprise Linux AI
  2. RHELAI-2869

[instructlab/instructlab] Investigate work required to support 128k context

XMLWordPrintable

      [2771357714] Upstream Reporter: Kim
      Upstream issue status: Open
      Upstream description:

      Dependent on Model Prod and SDG investigation

      Feature Overview:

      • Support Granite 3.1 model with 128k context window with a new 8b ~64k dataset.

      Note: A 128k context window on an 8B model can adversely affect performance benchmarks, hence the initial focus on ~64K effective context window.

      Goal:

      • Fine-tune the 128k model using a new 8b ~64k dataset
      • Note: This new dataset requires clearance from legal
      • Identify the effective context window to be used as the new supported context window
      • The expectation is at least a 64k context window

      Requirements:

      • Fine-tune the 128k model with a new 8b ~64k dataset
      • Validate and document effective context window
      • Identify any deviation in the performance of the final model
      • To move as GA, it should be within the margin of error
      • Identify optimal batch size for training

      Done - Acceptance Criteria:

      [ ] InstructLab can fine-tune the 128k model using a new 8b ~64k dataset [ ] Document the effective context window of the resulting model [ ] Evaluate and compare the performance of the final model to a 4k fine-tuned model [ ] Document and use optimal batch size during training (if required)

      Use Cases:

      Enterprise use cases that would benefit from a large context window include the following:

      • RAG
      • Summarization
      • Code generation
      • Tools use
      • Advanced reasoning

      Out of Scope :

      • For phase 2, creating an 8b 128k dataset or SDG optimizations for 128k context windows is out of scope.

      Questions to Answer:

      • What are the optimal hyperparameters for fine-tuning a model for the ~64k context window
      • Can we default to the 128k model, or are there circumstances in which we should default to previous student models?

      Upstream URL: https://github.com/instructlab/instructlab/issues/2858

              Unassigned Unassigned
              upstream-sync Upstream Sync
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: