Loading...

XML

Word

Printable

Type: Initiative
Resolution: Duplicate
Priority: Major
Fix Version/s: None
Affects Version/s: None
Component/s: Model Validation
Labels:
- model-validation

Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False

SFDC Cases Links:
SFDC Cases Open:
SFDC Cases Counter:

Intelligence Requested:
Market:

Objective

Currently, when MLEs work on the same report in J-Benchmark, they must be on the same Git branch. This constraint causes inefficiencies, limits parallel work, and complicates collaboration. The goal of this epic is to define and implement a new approach for generating reports that:

Eliminates this Git-based limitation

Enables simple and flexible scheduling of benchmarks

Improves collaboration and parallel development

Maintains scalability and ease of maintenance

Background

Reports in J-Benchmark are currently defined using a class-based system in code.

This ties report development directly to the Git branch, making collaboration slow and error-prone.

There's no easy way for multiple MLEs to work on the same report independently or schedule report runs flexibly.

Key Phases

Phase 1: Discover How MLEs Collaborate Today

Interview or shadow MLEs working on shared reports

Identify:

- Specific collaboration pain points

- Git-related limitations

- Workarounds currently used

Output: Summary of current workflows and challenges

Phase 2: Document Pros and Cons of the Class-Based Approach

Describe the current class-based system used for report definitions

List its strengths and weaknesses across:

- Maintainability

- Performance

- Flexibility

- Collaboration

Emphasize the branching constraint as a core blocker

Output: Structured pros/cons document

Phase 3: Propose Alternative Solutions

Provide a list of viable alternatives to the current method:

- Option 1: Improve the class-based system

- Option 2: Move to a database-driven approach

- Option 3: Use a hybrid approach (code + config)

- Option 4 >> ADD yours!!!!!

No need to decide yet — these are examples for discussion.

Output: Short write-up listing pros/cons trade-offs per direction

Phase 4: Present to R&D for Discussion and Decision

Present the current state, issues, and options to the R&D team

Invite suggestions for additional solutions

Emphasize the goal: make benchmark scheduling and report iteration simple and efficient

Facilitate an open discussion and alignment on next steps

Output: Preferred direction selected and documented

Phase 5: Implement the Approved Approach

Expected Deliverables

Collaboration workflow summary

Class-based approach pros/cons

List of possible alternative approaches
R&D presentation and selected path
Implementation plan + working solution

Note:

1. Please make sure to design the separation in a way that supports triggering benchmarks in both workflows:

- The standard flow, where the model spins up at execution time

- The alternate flow, where the model is already running and the user simply provides a URL + API key

**

2. Please review how llm-d-benchmark implemented this capability. In the end, there are a few additional components that need to be addressed beyond the basic decomposition:

- Ensure that models can be benchmarked without having to be “pushed” into the codebase—just like llm-d-benchmark does.

- To extend on point above: we need to support running benchmarks without adding models to the code. However, if we take this approach, keep in mind that some validations we run behind the scenes won’t trigger, and benchmark integrity might be compromised.

Similarly, even when a model is defined in code, we should still provide the option to run it with or without validation.

The goal here is to make it easy for MLEs to run benchmarks without “overthinking” the process.

Assignee:: Aviran Badli

Reporter:: Aviran Badli

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 2025/06/22 6:56 PM

Updated:: 2025/08/10 7:14 PM

Resolved:: 2025/08/10 7:14 PM

Details

Description

Objective

Background

Key Phases

Phase 1: Discover How MLEs Collaborate Today

Phase 2: Document Pros and Cons of the Class-Based Approach

Phase 3: Propose Alternative Solutions

Phase 4: Present to R&D for Discussion and Decision

Phase 5: Implement the Approved Approach

Expected Deliverables

Attachments

Easy Agile Planning Poker

Activity

People

Dates

PagerDuty