Loading...

XML

Word

Printable

Type: Story
Resolution: Unresolved
Priority: Major
Fix Version/s: None
Affects Version/s: None
Labels:
None

Activity Type:
Future Sustainability
Blocked:
False
Blocked Reason:

Hide

None

Show
None
Ready:
False
Epic Link:
None
Story Points:
None

Target Version:
None
Release Blocker:
None
Sprint:
None

As a TRT member
I want individual job runs within aggregated jobs to be retried when failing within 30 mins
So that I am more likely to get real signal, and possible successful aggregation runs

Implementation Notes

This functionality could either exist within the aggregator itself, or within the prpqr_reconciler. Wherever it goes, it will need to watch the job runs, and if it notices them fail within 30 mins, retry them. The old runs will have to have their annotations adjusted so they are no longer tracked by the aggregator.

The retry itself should be as simple as submitting the same prowjob again with the status fields (and maybe some UUIDs) reset/removed.

Acceptance Criteria

Job runs that aggregated jobs track retry (once) when failing within 30 mins of being started
The original runs are not considered during aggregation, only the retries are

is related to

SHIPSTRAT-3 A successful nightly most nights

Refinement

Assignee:: Unassigned

Reporter:: Stephen Goeddel

Need Info From:: None

Contributors:: None

QA Contact:: None

Doc Contact:: None

Votes:: 0 Vote for this issue

Watchers:: 1 Start watching this issue

Created:: 2025/11/21 7:17 PM

Updated:: 2026/01/05 8:11 PM

Details

Description

Implementation Notes

Acceptance Criteria

Attachments

Issue Links

Easy Agile Planning Poker

Activity

People

Dates