-
Story
-
Resolution: Unresolved
-
Major
-
None
-
None
-
8
-
False
-
-
False
-
-
Story (Required)
As a developer consuming AI software templates, I would like to have the option to use / deploy models through OpenShift Ai's Model Serving Runtime functionality.
Background (Required)
Currently, our software templates provide two options for deploying and using models. They can choose a pre-selected model and model server based on the sample app (e.g. vllm + ibm-granite for llm, whisper.cpp for audio-to-text, etc), or they can input a model server hostname.
OpenShift AI provides the option to deploy model servers through its model serving runtime functionality: https://docs.redhat.com/en/documentation/red_hat_openshift_ai_cloud_service/1/html-single/serving_models/index
https://github.com/rh-aiservices-bu/llm-on-openshift/tree/main/serving-runtimes provides some example serving runtimes, one of which is vllm.
Out of scope
<Defines what is not included in this story>
Approach (Required)
- Investigate how to utilize model serving runtimes for all of our currently supported model servers and models
- vllm and ollama for chatbot/codegen
- whisper.cpp (or some equivalent model) for audio-to-text
- facebook/detr-resnet-101 (or some equivalent) for object recognition
- Investigate how OpenShift AI needs to be configured:
- Is a standard RHOAI install sufficient?
- What is needed to utilize serving runtimes? (e.g. Minio as a storage backend)
- Should minio be configured by us, or require it to be setup before hand?
- Investigate how to integrate model serving runtimes into our existing software templates
-
- Is it feasible for all of our templates or only subset (e.g. llm)
- How should the user choose to deploy their models via model serving runtimes?
Dependencies
<Describes what this story depends on. Dependent Stories and EPICs should be linked to the story.>
Acceptance Criteria (Required)
<Describe edge cases to consider when implementing the story and defining tests>
<Provides a required and minimum list of acceptance tests for this story. More is expected as the engineer implements this story>
documentation updates (design docs, release notes etc)
demo needed
SOP required
education module update (Filled by DEVHAS team only)
R&D label required (Filled by DEVHAS team only)
Done Checklist
Code is completed, reviewed, documented and checked in
Unit and integration test automation have been delivered and running cleanly in continuous integration/staging/canary environment
Continuous Delivery pipeline(s) is able to proceed with new code included
Customer facing documentation, API docs, design docs etc. are produced/updated, reviewed and published
Acceptance criteria are met
If the Grafana dashboard is updated, ensure the corresponding SOP is updated as well