Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-8796

OpenVINO Model Server runtime does not have the required flag to force GPU usage

XMLWordPrintable

    • False
    • None
    • False
    • Testable
    • No
    • 1.28.0
    • No
    • Hide
      == OpenVINO Model Server runtime did not have the required flag to force GPU usage
      OpenShift Data Science includes the OpenVINO Model Server (OVMS) model-serving runtime by default. When you configured a new model server and chose this runtime, the *Configure model server* dialog enabled you to specify a number of GPUs to use with the model server. However, when you finished configuring the model server and deployed models from it, the model server did not actually use any GPUs. This issue is now fixed and the model server uses the GPUs.
      Show
      == OpenVINO Model Server runtime did not have the required flag to force GPU usage OpenShift Data Science includes the OpenVINO Model Server (OVMS) model-serving runtime by default. When you configured a new model server and chose this runtime, the *Configure model server* dialog enabled you to specify a number of GPUs to use with the model server. However, when you finished configuring the model server and deployed models from it, the model server did not actually use any GPUs. This issue is now fixed and the model server uses the GPUs.
    • Bug Fix
    • Done
    • No
    • Pending
    • None
    • RHODS 1.28

      Description of problem:

      Following up from https://issues.redhat.com/browse/RHODS-6529, we should now be able to deploy OVMS runtimes that request (and use) GPUs.
      In the current 1.27 build we are able to request a GPU for the runtime via UI (assuming GPUs are available in the cluster), but the ServingRuntime that gets created does not have the correct flags to force the model to be served on the gpu device, i.e. (from https://github.com/opendatahub-io/modelmesh-runtime-adapter/pull/15):

      spec:
        builtInAdapter:
          env:
            - name: OVMS_FORCE_TARGET_DEVICE
              value: NVIDIA 

      This is the full ServingRuntime def created when requesting 1 GPU:

       

      apiVersion: serving.kserve.io/v1alpha1
      kind: ServingRuntime
      metadata:
        annotations:
          enable-auth: 'false'
          enable-route: 'true'
          opendatahub.io/template-display-name: OpenVINO Model Server
          opendatahub.io/template-name: ovms
          openshift.io/display-name: ovms-test
        resourceVersion: '585132'
        name: ovms-test
        uid: 2a220f40-6988-4b24-b1f1-f39f9ba810f8
        creationTimestamp: '2023-05-18T15:27:09Z'
        generation: 1
        managedFields:
          - apiVersion: serving.kserve.io/v1alpha1
            fieldsType: FieldsV1
            fieldsV1:
              'f:metadata':
                'f:annotations':
                  .: {}
                  'f:enable-auth': {}
                  'f:enable-route': {}
                  'f:opendatahub.io/template-display-name': {}
                  'f:opendatahub.io/template-name': {}
                  'f:openshift.io/display-name': {}
                'f:labels':
                  .: {}
                  'f:name': {}
                  'f:opendatahub.io/dashboard': {}
              'f:spec':
                'f:builtInAdapter':
                  .: {}
                  'f:memBufferBytes': {}
                  'f:modelLoadingTimeoutMillis': {}
                  'f:runtimeManagementPort': {}
                  'f:serverType': {}
                'f:multiModel': {}
                'f:containers': {}
                'f:protocolVersions': {}
                'f:grpcEndpoint': {}
                'f:supportedModelFormats': {}
                .: {}
                'f:replicas': {}
                'f:grpcDataEndpoint': {}
            manager: unknown
            operation: Update
            time: '2023-05-18T15:27:09Z'
        namespace: test
        labels:
          name: ovms-test
          opendatahub.io/dashboard: 'true'
      spec:
        builtInAdapter:
          memBufferBytes: 134217728
          modelLoadingTimeoutMillis: 90000
          runtimeManagementPort: 8888
          serverType: ovms
        containers:
          - args:
              - '--port=8001'
              - '--rest_port=8888'
              - '--config_path=/models/model_config_list.json'
              - '--file_system_poll_wait_seconds=0'
              - '--grpc_bind_address=127.0.0.1'
              - '--rest_bind_address=127.0.0.1'
            image: >-
              quay.io/opendatahub/openvino_model_server@sha256:20dbfbaf53d1afbd47c612d953984238cb0e207972ed544a5ea662c2404f276d
            name: ovms
            resources:
              limits:
                cpu: '2'
                memory: 8Gi
                nvidia.com/gpu: 1
              requests:
                cpu: '1'
                memory: 4Gi
                nvidia.com/gpu: 1
        grpcDataEndpoint: 'port:8001'
        grpcEndpoint: 'port:8085'
        multiModel: true
        protocolVersions:
          - grpc-v1
        replicas: 1
        supportedModelFormats:
          - autoSelect: true
            name: openvino_ir
            version: opset1
          - autoSelect: true
            name: onnx
            version: '1'
       

      Prerequisites (if any, like setup, operators/versions):

      RHODS 1.27 RC (iib:498989)

      Steps to Reproduce

      1. Provision GPU Node
      2. Install Nvidia GPU Add-On
      3. Create Data Science Project
      4. Configure Model Server (with at least 1 GPU)
      5. Deploy a model
      6. Look at the ServingRuntime def for the Model Server in the DSP namespace

      Actual results:

      No force flag for Nvidia devices

      Expected results:

      force flag for Nvidia devices in ServingRuntime

      Reproducibility (Always/Intermittent/Only Once):

      Always

      Build Details:

      Workaround:

      Add the flag manually to the ServingRuntime, let the model server redeploy

      Additional info:

              lferrnan@redhat.com Lucas Fernandez Aragon
              rhn-support-lgiorgi Luca Giorgi
              Luca Giorgi Luca Giorgi
              Votes:
              0 Vote for this issue
              Watchers:
              9 Start watching this issue

                Created:
                Updated:
                Resolved: