• False
    • Hide

      None

      Show
      None
    • False

      Milestones:

      1. Decouple benchmark execution from code

      Allow benchmarks to be launched independently from the codebase, enabling flexible and modular execution.
      (Note: tracked under a separate Initiative)

      2. Alignment with PSAP on execution approach

      Coordinate with the PSAP team to agree on how they will run their benchmarks through our platform.

      4. Benchmark Observability Dashboard

      Implement a dashboard to display benchmark execution status: what's running, for how long, success/failure, etc.
      (Design already exists — need to define and implement execution logic.) + cost-tracking dashboard for internal visibility.
      Should support filtering by model, team, run configuration, and timeframe — to analy

      5. Expand E2E coverage for new features

      Update and extend E2E test coverage to include the new features introduced in this phase.

      6. PSAP POC

      Run a proof-of-concept with the PSAP team to validate integration and performance on their workloads.

      7. Internal presentation

      Present the outcome and progress to Tom, Sherrard, and Liora, and follow up with an internal email summary.

      8. Documentation Package for Benchmarking

      Prepare a complete documentation hub covering:

      • How to run benchmarks on the system
      • System architecture overview
      • Common issues and troubleshooting (Q&A)
      • Best practices and known limitations
      1. Automated Model Onboarding Flow

      External users will be able to request model onboarding via an automated workflow.
      The flow will generate a pull request (PR) for internal review and pause until the PR is approved. Once the PR is merged, the system will automatically notify the user — via email or Slack — that the model has been successfully added and is now available for use.

      Long-term, this process may evolve into a fully automated pipeline.
      We may choose to implement the long-term version from day one — this requires alignment and decision-making with Aviran.

      10. Optional >> UI/UX for Model Onboarding & Execution Management

      Design and implement a user-friendly UI/UX interface for managing model onboarding and benchmark execution.
      Instead of relying on manual database entries or CLI-based workflows, users will be able to:

      • Submit new models directly through the UI.
      • Track the onboarding and approval status of each model.
      • Trigger benchmark runs from the interface.
      • Monitor run status, runtime logs, cost, and historical results in real time.

      This phase aims to improve accessibility for non-technical users and bring visibility and control to the model benchmarking lifecycle.

      Note: This entire phase could potentially evolve into a standalone epic, as it involves full productization of user-facing onboarding and execution flows.

       

              rh-ee-abadli Aviran Badli (Inactive)
              rh-ee-abadli Aviran Badli (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

                Created:
                Updated:
                Resolved: