XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • None
    • None
    • RHOAI Model Serving CPT Q2 2024
    • False
    • None
    • False
    • Not Selected
    • To Do
    • 0% To Do, 0% In Progress, 100% Done

      Epic Goal

      Performance testing for RHOAI model serving is an ongoing effort including running the CPT, analyzing the results, expanding our test coverage, and iterating on the tools involved. This epic is meant to capture the related work we intend to complete during Q2 2024.

      • Continuous performance testing of RHOAI model serving stack for release-to-release regression analysis
      • Improvements to the automated performance testing pipeline for RHOAI model serving stack and the included tools (llm-load-test, topsail)
      • Enhancements to the test coverage: new models, new runtimes, new hardware configurations
      • Performance experiments with LLM model serving on RHOAI, with the goal of gathering data which can be used to guide customers on sizing and hardware/platform recommendations for different models

      Why is this important?

      • LLM model serving is currently a top priority for OpenShift AI and the company.
      • These workloads are performance-sensitive and require expensive hardware to run effectively. Many customers are interested in leveraging LLMs for their business use-cases, but performance and cost efficiency are critical in doing so. 
      • We need to catch any potential regressions in the LLM model serving stack in RHOAI as early as possible
      •  

      Scenarios

      1.  

      Acceptance Criteria

      • We have completed our planned model serving performance testing for each RHOAI releases (Starting with 2.10)
      • All enhancement stories have been completed or moved to a follow-up epic

      Dependencies (internal and external)

      1.  

      Previous Work (Optional):

      1. Performance assessment to support watsonx.ai rebase - PSAP-1261
      2. https://github.com/openshift-psap/llm-load-test
      3. https://github.com/openshift-psap/topsail/tree/main/projects/kserve

      Open questions:

      1. The current single-model tests take >5 hours to run. How can we add more models and runtime combinations without increasing this length to 12+ hours? Different test cases that we run on different frequencies? Relegate some test cases to only one-off experiments?

            dagray@redhat.com David Gray
            dagray@redhat.com David Gray
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: