-
Epic
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
Stress test the Kubeflow & Kueue operators
-
MLOps, RHOAI, Training
-
Not Selected
-
False
-
False
-
None
-
40% To Do, 0% In Progress, 60% Done
Epic Goal
- Design, implement and run a stress test for the Kubeflow training operator
- Integrate the Kubeflow training operator stress test in a continuous performance testing pipeline for regression analyses
- Design, implement and run a stress test for the Kueue scheduler
- Integrate the Kueue scheduler stress test in a continuous performance testing pipeline for regression analyses
Current focus in on the Kueue scheduler.
Why is this important?
- These components are getting integrated into RHOAI.
- They are critical for the efficiency of the distributed workload components
Deadlines / timeframe
- Kubeflow training operator --> final build by the end of June
- Kueue --> due to be GA for summit/ RHOAI 2.10.
Previous Work (Optional):
Discussions
Deliverables
1.
|
Integrate a Kueue stress test in TOPSAIL | In Progress | Unassigned |