Uploaded image for project: 'Red Hat Internal Developer Platform'
  1. Red Hat Internal Developer Platform
  2. RHIDP-10036

spike: Investigate using RHOAI Model Serving Runtimes in Software Templates

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • ai-templates

      Story (Required)

      As a developer consuming AI software templates, I would like to have the option to use / deploy models through OpenShift Ai's Model Serving Runtime functionality.

      Background (Required)

      Currently, our software templates provide two options for deploying and using models. They can choose a pre-selected model and model server based on the sample app (e.g. vllm + ibm-granite for llm, whisper.cpp for audio-to-text, etc), or they can input a model server hostname.

      OpenShift AI provides the option to deploy model servers through its model serving runtime functionality: https://docs.redhat.com/en/documentation/red_hat_openshift_ai_cloud_service/1/html-single/serving_models/index

      https://github.com/rh-aiservices-bu/llm-on-openshift/tree/main/serving-runtimes provides some example serving runtimes, one of which is vllm.

      Out of scope

      <Defines what is not included in this story>

      Approach (Required)

      • Investigate how to utilize model serving runtimes for all of our currently supported model servers and models
        • vllm and ollama for chatbot/codegen
        • whisper.cpp (or some equivalent model) for audio-to-text
        • facebook/detr-resnet-101 (or some equivalent) for object recognition
      • Investigate how OpenShift AI needs to be configured:
        • Is a standard RHOAI install sufficient?
        • What is needed to utilize serving runtimes? (e.g. Minio as a storage backend)
        • Should minio be configured by us, or require it to be setup before hand?
      • Investigate how to integrate model serving runtimes into our existing software templates
        • Is it feasible for all of our templates or only subset (e.g. llm)
        • How should the user choose to deploy their models via model serving runtimes?

      Dependencies

      <Describes what this story depends on. Dependent Stories and EPICs should be linked to the story.>

      Acceptance Criteria (Required)

      <Describe edge cases to consider when implementing the story and defining tests>

      <Provides a required and minimum list of acceptance tests for this story. More is expected as the engineer implements this story>

      documentation updates (design docs, release notes etc)
      demo needed
      SOP required
      education module update (Filled by DEVHAS team only)
      R&D label required (Filled by DEVHAS team only)

      Done Checklist

      Code is completed, reviewed, documented and checked in
      Unit and integration test automation have been delivered and running cleanly in continuous integration/staging/canary environment
      Continuous Delivery pipeline(s) is able to proceed with new code included
      Customer facing documentation, API docs, design docs etc. are produced/updated, reviewed and published
      Acceptance criteria are met
      If the Grafana dashboard is updated, ensure the corresponding SOP is updated as well

              Unassigned Unassigned
              johnmcollier John Collier
              RHIDP - AI
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: