-
Bug
-
Resolution: Done
-
Blocker
-
None
-
1
-
False
-
-
False
-
No
-
No
-
-
-
1
-
Model Serving Sprint 2.9-2
-
Testable
Overview
Within BAM and watsonx.ai, Raw Deployments need to be fronted by a routing component. Currently, the FMaaS/Rust router (and Caikit) client-side load balance and proxy requests across a model deployment's pods/replica's. To do so they utilizes a Headless service that sits in between itself and the replica's, queries addresses to the physical pods, and round robins requests.
Issue
When you scale a model deployment up to more than a single pod, w/o the service being configured as headless, all of the requests will flow to only the first pod in the scaled deployment
Acceptance Criteria
As part of the Raw mode deployment process (CR submission) there needs to be a way to configure whether the resultant Service has a Cluster IP (which is supported today) vs having a Cluster IP of None (headless, not supported today).
- split to
-
RHOAIENG-5077 [Follow-up-upstream tracker] Routing and Headless Service Support in KServe Raw Mode Deployment
- In Progress
- links to
-
RHBA-2024:129615 RHOAI 2.8.1 - Red Hat OpenShift AI