Uploaded image for project: 'OCP Technical Release Team'
  1. OCP Technical Release Team
  2. TRT-2431

Aggregate jobs should be retried when failing install within 30 mins

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Unresolved
    • Icon: Major Major
    • None
    • None
    • None
    • None
    • False
    • Hide

      None

      Show
      None
    • False
    • None
    • None
    • None
    • None

      As a TRT member
      I want individual job runs within aggregated jobs to be retried when failing within 30 mins
      So that I am more likely to get real signal, and possible successful aggregation runs

      Implementation Notes

      This functionality could either exist within the aggregator itself, or within the prpqr_reconciler. Wherever it goes, it will need to watch the job runs, and if it notices them fail within 30 mins, retry them. The old runs will have to have their annotations adjusted so they are no longer tracked by the aggregator.

      The retry itself should be as simple as submitting the same prowjob again with the status fields (and maybe some UUIDs) reset/removed.

      Acceptance Criteria

      • Job runs that aggregated jobs track retry (once) when failing within 30 mins of being started
      • The original runs are not considered during aggregation, only the retries are

              Unassigned Unassigned
              sgoeddel@redhat.com Stephen Goeddel
              None
              None
              None
              None
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: