Loading...

XML

Word

Printable

Overview

ATS relies on queue processing to perform many of its tasks. Currently, the queue processing implementation is quite naive, with challenges such as:

Events are rescheduled in large batches, creating a potential stampeding herd problem and queue live-lock if certain events cannot be processed
There is no back-off when events fail to be processed e.g an external dependency is down
There is no jitter applied to events that are rescheduled to create a more even distribution of work
Controllers indicate an event should be rescheduled by returning an error
Controllers have no ability to indicate when an event should next be retried

The purpose of this epic is two-fold:

Incrementally improve the queue handling in ATS to address the above
Contribute the changes upstream into the rh-trex project so that all future micro-services benefit from our work

Acceptance Criteria

Done Criteria

References
Links to Gdocs, GitHub, and any other relevant information about this epic.