-
Story
-
Resolution: Done
-
Normal
-
None
-
False
-
None
-
False
-
Yes
-
-
-
-
-
-
1.20.0-z
-
No
-
No
-
Yes
-
None
Bugs
- [Bug]: DSG Project creation disables label for modelmesh #744
- Targeted for 1.21 -> [Model Serving]: Enable GPU in Model Server configuration #703
Requirement 1
P0: Users must be able to configure a server for the model
P1: Specify target platform configuration (eg. compute resources - CPU, memory, GPU) for served models
Issues
Requirement 2
P0: Model storage. Users must be able to to deploy a model stored in a S3 location
P0: Model frameworks: Users must be able to serve models based on a variety of frameworks
P0: Ability to serve models not developed in RHODS
- Using frameworks from A
- Stored in locations in [Model Serving]: Allow configuring the server #641
Issues
Requirement 3
P0: Ability to view list of deployed models for a project
- Ability to access endpoint
- Ability to view monitoring and performance metrics
P0: The system must help indicate the health (are they up) of endpoints for deployed models
P1: Support multi-model serving; ability to serve multiple models on one server
- [Model Serving]: Support Visualization of the Model Server #649
- [Model Serving]: Support Visualization of the Deployed Model #657
Requirement 4
P0: Users must be able to easily retrieve the endpoint for a served model (to use for inference, either testing or incorporating into an app)
- P0: Users must be able to secure endpoints so they are not publicly available: Authentication & authorization capabilities
Issues
Requirement 5
P0: Ability to view global list of all deployed models (across all projects)
- Filtering / search capabilities
- Users view all models deployed within projects they have access to, admins view all
Issues
Requirement 6
P1: Ability to delete a model
Issues
Requirement 7
P0: Manually add a new version for served model & deploy (replace)
P0: Edit model server
P1: Deploy the new version of the model - exist with previous; multiple deployed endpoints ----> TODO: Review
Issues
Requirement 8 (Targeted for 1.21)
P0: Inference performance metrics. Users must be able to access performance metrics for all deployed models
- P0: Inference performance - latency (avg. time to process 1 input)
- P0: Target metrics for v1:
- Avg. response time over period of time (eg. last 24 hours or last week/month to gauge trends over time)
- Number of requests over defined period of time (including option for all time)
Issues
- clones
-
RHODS-4620 UX for Serving Models
-
- Closed
-
- is cloned by
-
RHODS-4622 UI back end for Serving Models in ODH core
-
- Closed
-