Loading...

XML

Word

Printable

Type: Bug
Resolution: Done
Priority: Major
Fix Version/s: RHODS_1.28.0_GA
Affects Version/s: RHODS_1.27.0_GA
Component/s: Integrations, UI
Labels:
- eng
- groomed

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Acceptance Criteria:
None
Affects Testing:

Testable
Automated:
No
CDW blocker:
CDW devel_ack:
CDW docs_ack:
CDW pm_ack:
CDW qa_ack:
CDW release:
Fixed in Build:
1.28.0
Regression:
No
Release Note Text:

Hide
== OpenVINO Model Server runtime did not have the required flag to force GPU usage
OpenShift Data Science includes the OpenVINO Model Server (OVMS) model-serving runtime by default. When you configured a new model server and chose this runtime, the *Configure model server* dialog enabled you to specify a number of GPUs to use with the model server. However, when you finished configuring the model server and deployed models from it, the model server did not actually use any GPUs. This issue is now fixed and the model server uses the GPUs.

Show
== OpenVINO Model Server runtime did not have the required flag to force GPU usage OpenShift Data Science includes the OpenVINO Model Server (OVMS) model-serving runtime by default. When you configured a new model server and chose this runtime, the *Configure model server* dialog enabled you to specify a number of GPUs to use with the model server. However, when you finished configuring the model server and deployed models from it, the model server did not actually use any GPUs. This issue is now fixed and the model server uses the GPUs.
Release Note Type:
Bug Fix
Release Note Status:
Done
Target Release:

RHODS_1.28.0_GA
Test Blocker:
No
Test Coverage:

Pending
Watchlist Impact:
None
Git Pull Request:
https://github.com/red-hat-data-services/odh-deployer/pull/340/files, https://github.com/opendatahub-io/odh-dashboard/pull/1264
Intelligence Requested:
Market:

Sprint:
RHODS 1.28

SFDC Cases Counter:
SFDC Cases Open:
SFDC Cases Links:

Description of problem:

Following up from https://issues.redhat.com/browse/RHODS-6529, we should now be able to deploy OVMS runtimes that request (and use) GPUs.
In the current 1.27 build we are able to request a GPU for the runtime via UI (assuming GPUs are available in the cluster), but the ServingRuntime that gets created does not have the correct flags to force the model to be served on the gpu device, i.e. (from https://github.com/opendatahub-io/modelmesh-runtime-adapter/pull/15):

spec:
  builtInAdapter:
    env:
      - name: OVMS_FORCE_TARGET_DEVICE
        value: NVIDIA

This is the full ServingRuntime def created when requesting 1 GPU:

apiVersion: serving.kserve.io/v1alpha1
kind: ServingRuntime
metadata:
  annotations:
    enable-auth: 'false'
    enable-route: 'true'
    opendatahub.io/template-display-name: OpenVINO Model Server
    opendatahub.io/template-name: ovms
    openshift.io/display-name: ovms-test
  resourceVersion: '585132'
  name: ovms-test
  uid: 2a220f40-6988-4b24-b1f1-f39f9ba810f8
  creationTimestamp: '2023-05-18T15:27:09Z'
  generation: 1
  managedFields:
    - apiVersion: serving.kserve.io/v1alpha1
      fieldsType: FieldsV1
      fieldsV1:
        'f:metadata':
          'f:annotations':
            .: {}
            'f:enable-auth': {}
            'f:enable-route': {}
            'f:opendatahub.io/template-display-name': {}
            'f:opendatahub.io/template-name': {}
            'f:openshift.io/display-name': {}
          'f:labels':
            .: {}
            'f:name': {}
            'f:opendatahub.io/dashboard': {}
        'f:spec':
          'f:builtInAdapter':
            .: {}
            'f:memBufferBytes': {}
            'f:modelLoadingTimeoutMillis': {}
            'f:runtimeManagementPort': {}
            'f:serverType': {}
          'f:multiModel': {}
          'f:containers': {}
          'f:protocolVersions': {}
          'f:grpcEndpoint': {}
          'f:supportedModelFormats': {}
          .: {}
          'f:replicas': {}
          'f:grpcDataEndpoint': {}
      manager: unknown
      operation: Update
      time: '2023-05-18T15:27:09Z'
  namespace: test
  labels:
    name: ovms-test
    opendatahub.io/dashboard: 'true'
spec:
  builtInAdapter:
    memBufferBytes: 134217728
    modelLoadingTimeoutMillis: 90000
    runtimeManagementPort: 8888
    serverType: ovms
  containers:
    - args:
        - '--port=8001'
        - '--rest_port=8888'
        - '--config_path=/models/model_config_list.json'
        - '--file_system_poll_wait_seconds=0'
        - '--grpc_bind_address=127.0.0.1'
        - '--rest_bind_address=127.0.0.1'
      image: >-
        quay.io/opendatahub/openvino_model_server@sha256:20dbfbaf53d1afbd47c612d953984238cb0e207972ed544a5ea662c2404f276d
      name: ovms
      resources:
        limits:
          cpu: '2'
          memory: 8Gi
          nvidia.com/gpu: 1
        requests:
          cpu: '1'
          memory: 4Gi
          nvidia.com/gpu: 1
  grpcDataEndpoint: 'port:8001'
  grpcEndpoint: 'port:8085'
  multiModel: true
  protocolVersions:
    - grpc-v1
  replicas: 1
  supportedModelFormats:
    - autoSelect: true
      name: openvino_ir
      version: opset1
    - autoSelect: true
      name: onnx
      version: '1'

Prerequisites (if any, like setup, operators/versions):

RHODS 1.27 RC (iib:498989)

Steps to Reproduce

Provision GPU Node
Install Nvidia GPU Add-On
Create Data Science Project
Configure Model Server (with at least 1 GPU)
Deploy a model
Look at the ServingRuntime def for the Model Server in the DSP namespace

Actual results:

No force flag for Nvidia devices

Expected results:

force flag for Nvidia devices in ServingRuntime

Reproducibility (Always/Intermittent/Only Once):

Always

Build Details:

Workaround:

Add the flag manually to the ServingRuntime, let the model server redeploy

Additional info:

links to

red-hat-data-services/odh-deployer#340: Update servingruntime ootb configuration

red-hat-data-services/odh-deployer#348: Prepare 1.28

mentioned on

Merge request - Updated 2 upstream sources

Merge request - Updated 4 upstream sources

(1 mentioned on)

Assignee:: Lucas Fernandez Aragon

Reporter:: Luca Giorgi

QA Contact:: Luca Giorgi

Votes:: 0 Vote for this issue

Watchers:: 9 Start watching this issue

Created:: 2023/05/18 3:31 PM

Updated:: 2023/07/10 9:31 AM

Resolved:: 2023/06/01 2:52 PM

Details

Description

Description of problem:

Steps to Reproduce

Actual results:

Expected results:

Reproducibility (Always/Intermittent/Only Once):

Build Details:

Workaround:

Additional info:

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates

Hide