Uploaded image for project: 'Red Hat OpenShift Data Science'
  1. Red Hat OpenShift Data Science
  2. RHODS-7135

Model Serving v2 - monitoring and metrics requirement

XMLWordPrintable

    • Model Monitoring and Metrics - Model Serving v2
    • False
    • None
    • False
    • Testable
    • No
    • To Do
    • 67% To Do, 11% In Progress, 22% Done
    • No
    • Pending
    • None

      Reqs doc (covered as part of broader model serving v2 reqs): https://docs.google.com/document/d/1TXLEyzpYX6inMHOlaUW8VbMxMvQ5Y9TEzHM87G9_1Yk/edit?usp=sharing 

       

      From Jeff: It would be good to get a basic metric that provides insight into whether customers are using the feature. For example, the number of deployed models at the cluster level.
      We just need to determine what metric would work for us and add it to the rhods rules at https://github.com/red-hat-data-services/odh-manifests/blob/master/monitoring/base/rhods-rules.yaml
      Part of the broader R11:
      Inference performance metrics. Users must be able to access performance metrics for all deployed models # P0:: Avg. response time over period of time (eg. last 24 hours or last week/month to gauge trends over time) at the individual model level

      1. P0: Number of requests over defined period of time (including option for all time) at the individual model level
      2. P0: Ability to view metrics at both the individual model and model server levels
      3. P0: CPU/GPU/memory utilization
      1. P0: configurable alerts based on defined thresholds:
      • Avg. response time
      • CPU/GPU/memory utilization
      • Number of requests (eg. above or below or certain threshold)
      • TBD: number of errors / failures in defined time period

              Unassigned Unassigned
              vmahabal@redhat.com Vedant Mahabaleshwarkar
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated: