Loading...

Type: Feature
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: Edge
Labels:
- Summit

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Parent Link:
RHOAISTRAT-73OpenShift AI supports the ability to easily deploy any model to any location including edge locations
Hierarchy Progress:
49
Hierarchy Progress Bar:

49% 49%

SFDC Cases Links:
SFDC Cases Counter:
SFDC Cases Open:

Intelligence Requested:
Market:

Note: This Feature is currently being refined in https://docs.google.com/document/d/1_8Z4YNn4ua7MLGGp124J-e6_XdnjPZSA-9U2iG-h6ug.

Feature Overview

For ML Ops Engineers using RHOAI to successfully deploy a model at the near-edge, with confidence that it's working properly, predictions are correct, and is always available.

Goals

The goal is to provide fully supported model serving capabilities in a distributed topology with SNO and Microshift footprints specifically. This means a containerized model with all the dependencies needed for serving is running in an environment with moderate disadvantaged resources.

Requirements

Environment

The target OCP footprints for this capabilities are:

Single Node OpenShift (must have based on customer request)
Microshift (nice to have)

Model serving

The minimal capabilities for serving the model are:

An operator that can fit the desired footprint that installs and configures the components needed at the near edge node (ej. ArgoCD for edge, ACM for edge, monitoring, observability, public API service).
[P0] A supported model runtime based on OpenVino with PyTorch and Tensorflow
[P0] Automated build of the containerized model using OpenShift pipelines supported also at the edge node.
Limited management capabilities:
[P0] Support for model upgrades based on pulling the target version from a local Quay repo, when the desired state of the model changes in git, and reconciling the desired state against the local k8s api of MicroShift or SNO.
Support for model upgrades in environments with limited connectivity. (Network proxies, disconnected periods, low bandwidth to send a container image)
[P0]A public API to retrieve: Model general information, availability, health and logs.
[P0]A public API at the core for MLOps engineers to interact with the CI pipelines.
[P0]A public API at the near edge to interact with the management and monitoring capabilities.
[P0] Support for exporting prometheus metrics for: latency and availability of the service.
Zero down time upgrades (Nice to have)
Resource consumption of all the components at the edge should fit ABB resource constraints
Support to forward the metrics collected at the edge node to the core location (Nice to have)
Visualization of the metrics in a prometheus dashboard (Nice to have).
Other monitoring metrics (Nice to have)
Resource consumption should be as minimal as possible (Nice to have).

Background

Lessons learned from customers:

Deploying models at the near edge is preferred using a GitOps approach.
Serving models at the edge using a lightweight model server that runs on an immutable container
Sending back the performance metrics to the core RHOAI instance to be visualized in the dashboard is a differentiator.

The user stories for this epic can be found here:

https://docs.google.com/document/d/16_jl5yk3wDLcmxy3PXtw0sI42Y19T_jbfI5tHn5ZvIY/edit#heading=h.1dc7etmadurz

Current mockups

https://docs.google.com/presentation/d/1W9wHeASZsxz4W1UzyN-HfQCqeMhI0dm4IpyxnPBALLo/edit#slide=id.g29607e927c0_0_225

What is the near edge?

Typically conventional server hardware or similar—the same that might be deployed in a data center—at the edge of the network. These powerful computers run full-scale server operating systems (typically Linux or Windows) and can be treated in the same way as any other cloud server. If they have access to AI-specific acceleration, it’s likely in the form of GPUs. Some edge servers are sold in ruggedized form factors that are better suited to industrial settings (like a factory floor) than their data center dwelling equivalents.

The power of edge servers means that they can provide many of the benefits of cloud compute while maintaining the security, privacy, and convenience that comes with keeping data on-site. For some applications, they can provide the best of both worlds—high-capability hardware, low latency, reduced risk of data leakage, and economic use of bandwidth.

In the context of Red Hat Portfolio Near edge computing is running workloads in a distributed topology where:

There is a constraint in resources.
Has k8s of some flavor.

Non-goals

Device management
non-OCP environments.
The feature should be available from within OpenShift UI, all the required components for automating, deploying, serving, monitoring and managing the fleet of edge servers should be integrated with the platform.
Multi-tenancy: None of the multi-tenancy features will be available for Microshift
High availability and upscaling: None of the HA or upscaling features features will be available for Microshift
Multi-cluster management: GitOps on Microshift will not support managing multi clusters from a single instance. This severely affects the footprint of any GitOps instance.

This feature is something we want to showcase for Summit 2024.

mentioned in: Page Loading...; Page Loading...

Details

Description

Feature Overview

Goals

Requirements

What is the near edge?

Non-goals

Attachments

Issue Links

Activity

People

Dates

PagerDuty