-
Story
-
Resolution: Unresolved
-
Major
-
None
-
None
-
None
-
None
-
False
-
-
False
-
None
-
None
-
None
-
None
-
None
As a TRT member
I want individual job runs within aggregated jobs to be retried when failing within 30 mins
So that I am more likely to get real signal, and possible successful aggregation runs
Implementation Notes
This functionality could either exist within the aggregator itself, or within the prpqr_reconciler. Wherever it goes, it will need to watch the job runs, and if it notices them fail within 30 mins, retry them. The old runs will have to have their annotations adjusted so they are no longer tracked by the aggregator.
The retry itself should be as simple as submitting the same prowjob again with the status fields (and maybe some UUIDs) reset/removed.
Acceptance Criteria
- Job runs that aggregated jobs track retry (once) when failing within 30 mins of being started
- The original runs are not considered during aggregation, only the retries are
- is related to
-
SHIPSTRAT-3 A successful nightly most nights
-
- Refinement
-