-
Bug
-
Resolution: Done
-
Major
-
RHODS_1.27.0_GA
-
False
-
None
-
False
-
Testable
-
No
-
-
-
-
-
-
-
1.28.0
-
No
-
-
Bug Fix
-
Done
-
No
-
Pending
-
None
-
-
-
RHODS 1.28
Description of problem:
Following up from https://issues.redhat.com/browse/RHODS-6529, we should now be able to deploy OVMS runtimes that request (and use) GPUs.
In the current 1.27 build we are able to request a GPU for the runtime via UI (assuming GPUs are available in the cluster), but the ServingRuntime that gets created does not have the correct flags to force the model to be served on the gpu device, i.e. (from https://github.com/opendatahub-io/modelmesh-runtime-adapter/pull/15):
spec: builtInAdapter: env: - name: OVMS_FORCE_TARGET_DEVICE value: NVIDIA
This is the full ServingRuntime def created when requesting 1 GPU:
apiVersion: serving.kserve.io/v1alpha1 kind: ServingRuntime metadata: annotations: enable-auth: 'false' enable-route: 'true' opendatahub.io/template-display-name: OpenVINO Model Server opendatahub.io/template-name: ovms openshift.io/display-name: ovms-test resourceVersion: '585132' name: ovms-test uid: 2a220f40-6988-4b24-b1f1-f39f9ba810f8 creationTimestamp: '2023-05-18T15:27:09Z' generation: 1 managedFields: - apiVersion: serving.kserve.io/v1alpha1 fieldsType: FieldsV1 fieldsV1: 'f:metadata': 'f:annotations': .: {} 'f:enable-auth': {} 'f:enable-route': {} 'f:opendatahub.io/template-display-name': {} 'f:opendatahub.io/template-name': {} 'f:openshift.io/display-name': {} 'f:labels': .: {} 'f:name': {} 'f:opendatahub.io/dashboard': {} 'f:spec': 'f:builtInAdapter': .: {} 'f:memBufferBytes': {} 'f:modelLoadingTimeoutMillis': {} 'f:runtimeManagementPort': {} 'f:serverType': {} 'f:multiModel': {} 'f:containers': {} 'f:protocolVersions': {} 'f:grpcEndpoint': {} 'f:supportedModelFormats': {} .: {} 'f:replicas': {} 'f:grpcDataEndpoint': {} manager: unknown operation: Update time: '2023-05-18T15:27:09Z' namespace: test labels: name: ovms-test opendatahub.io/dashboard: 'true' spec: builtInAdapter: memBufferBytes: 134217728 modelLoadingTimeoutMillis: 90000 runtimeManagementPort: 8888 serverType: ovms containers: - args: - '--port=8001' - '--rest_port=8888' - '--config_path=/models/model_config_list.json' - '--file_system_poll_wait_seconds=0' - '--grpc_bind_address=127.0.0.1' - '--rest_bind_address=127.0.0.1' image: >- quay.io/opendatahub/openvino_model_server@sha256:20dbfbaf53d1afbd47c612d953984238cb0e207972ed544a5ea662c2404f276d name: ovms resources: limits: cpu: '2' memory: 8Gi nvidia.com/gpu: 1 requests: cpu: '1' memory: 4Gi nvidia.com/gpu: 1 grpcDataEndpoint: 'port:8001' grpcEndpoint: 'port:8085' multiModel: true protocolVersions: - grpc-v1 replicas: 1 supportedModelFormats: - autoSelect: true name: openvino_ir version: opset1 - autoSelect: true name: onnx version: '1'
Prerequisites (if any, like setup, operators/versions):
RHODS 1.27 RC (iib:498989)
Steps to Reproduce
- Provision GPU Node
- Install Nvidia GPU Add-On
- Create Data Science Project
- Configure Model Server (with at least 1 GPU)
- Deploy a model
- Look at the ServingRuntime def for the Model Server in the DSP namespace
Actual results:
No force flag for Nvidia devices
Expected results:
force flag for Nvidia devices in ServingRuntime
Reproducibility (Always/Intermittent/Only Once):
Always
Build Details:
Workaround:
Add the flag manually to the ServingRuntime, let the model server redeploy
Additional info:
- links to
- mentioned on