Loading...

XML

Word

Printable

Type: Initiative
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: Model Validation
Labels:
- model-validation

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Intelligence Requested:
Market:

Milestones:

1. Decouple benchmark execution from code

Allow benchmarks to be launched independently from the codebase, enabling flexible and modular execution.
(Note: tracked under a separate Initiative)

2. Alignment with PSAP on execution approach

Coordinate with the PSAP team to agree on how they will run their benchmarks through our platform.

4. Benchmark Observability Dashboard

Implement a dashboard to display benchmark execution status: what's running, for how long, success/failure, etc.
(Design already exists — need to define and implement execution logic.) + cost-tracking dashboard for internal visibility.
Should support filtering by model, team, run configuration, and timeframe — to analy

5. Expand E2E coverage for new features

Update and extend E2E test coverage to include the new features introduced in this phase.

6. PSAP POC

Run a proof-of-concept with the PSAP team to validate integration and performance on their workloads.

7. Internal presentation

Present the outcome and progress to Tom, Sherrard, and Liora, and follow up with an internal email summary.

8. Documentation Package for Benchmarking

Prepare a complete documentation hub covering:

How to run benchmarks on the system

System architecture overview

Common issues and troubleshooting (Q&A)

Best practices and known limitations

Automated Model Onboarding Flow

External users will be able to request model onboarding via an automated workflow.
The flow will generate a pull request (PR) for internal review and pause until the PR is approved. Once the PR is merged, the system will automatically notify the user — via email or Slack — that the model has been successfully added and is now available for use.

Long-term, this process may evolve into a fully automated pipeline.
We may choose to implement the long-term version from day one — this requires alignment and decision-making with Aviran.

10. Optional >> UI/UX for Model Onboarding & Execution Management

Design and implement a user-friendly UI/UX interface for managing model onboarding and benchmark execution.
Instead of relying on manual database entries or CLI-based workflows, users will be able to:

Submit new models directly through the UI.

Track the onboarding and approval status of each model.

Trigger benchmark runs from the interface.

Monitor run status, runtime logs, cost, and historical results in real time.

This phase aims to improve accessibility for non-technical users and bring visibility and control to the model benchmarking lifecycle.

Note: This entire phase could potentially evolve into a standalone epic, as it involves full productization of user-facing onboarding and execution flows.

Assignee:: Aviran Badli

Reporter:: Aviran Badli

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/06/22 7:02 PM

Updated:: 2025/08/10 7:14 PM

Resolved:: 2025/08/10 7:14 PM

Details

Description

Milestones:

1. Decouple benchmark execution from code

2. Alignment with PSAP on execution approach

4. Benchmark Observability Dashboard

5. Expand E2E coverage for new features

6. PSAP POC

7. Internal presentation

8. Documentation Package for Benchmarking

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty