Uploaded image for project: 'Red Hat OpenShift AI Engineering'
  1. Red Hat OpenShift AI Engineering
  2. RHOAIENG-5439

[2.9.0] Routing and Headless Service Support in KServe Raw Mode Deployment

XMLWordPrintable

    • 1
    • Model Serving Sprint 2.9-2
    • Testable

      Overview
      Within BAM and watsonx.ai, Raw Deployments need to be fronted by a routing component. Currently, the FMaaS/Rust router (and Caikit) client-side load balance and proxy requests across a model deployment's pods/replica's. To do so they utilizes a Headless service that sits in between itself and the replica's, queries addresses to the physical pods, and round robins requests.

      Issue

      When you scale a model deployment up to more than a single pod, w/o the service being configured as headless, all of the requests will flow to only the first pod in the scaled deployment

      Acceptance Criteria

      As part of the Raw mode deployment process (CR submission) there needs to be a way to configure whether the resultant Service has a Cluster IP (which is supported today) vs having a Cluster IP of None (headless, not supported today).

            rhn-support-fspolti Filippe Spolti
            rhn-support-tibrahim Taneem Ibrahim
            Tarun Kumar Tarun Kumar
            RHOAI Model Server and Serving Metrics
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: