Uploaded image for project: 'Performance and Scale for AI Platforms'
  1. Performance and Scale for AI Platforms
  2. PSAP-374

Provide metrics/alerts for the GPU operator

XMLWordPrintable

    • Provide metrics/alerts GPU operator
    • False
    • False
    • Undefined

      OCP/Telco Definition of Done
      Epic Template descriptions and documentation.

      Epic Goal

      • Enhance the GPU Operator stability and maintenance by adding metrics/alerts reports

      Why is this important?

      • Long term stability and maintenance of the operator

      Acceptance Criteria

      • CI - MUST be running successfully with tests automated
      • Release Technical Enablement - Provide necessary release enablement details and documents.

      Done Checklist

      • CI - CI is running, tests are automated and merged.
      • Release Enablement <link to Feature Enablement Presentation>
      • DEV - Upstream code and tests merged: <link to meaningful PR or GitHub Issue>
      • DEV - Upstream documentation merged: <link to meaningful PR or GitHub Issue>
      • DEV - Downstream build attached to advisory: <link to errata>
      • QE - Test plans in Polarion: <link or reference to Polarion>
      • QE - Automated tests merged: <link or reference to automated tests>
      • DOC - Downstream documentation merged: <link to meaningful PR>

              kpouget2 Kevin Pouget
              gausingh@redhat.com Gaurav Singh
              Walid Abouhamad Walid Abouhamad
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

                Created:
                Updated:
                Resolved: