-
Bug
-
Resolution: Unresolved
-
Major
-
None
-
None
-
5
-
False
-
-
False
-
subs-swatch-thunder
-
-
-
Important
There's a potential race condition when running the hourly tally process multiple times in succession.
This was identified by random test failures of one of our IQE tests. The test introduces this issue because it runs a non-waiting tally sync, in a loop, multiple times.
Potential Cause (unverified):
During a nightly tally, we read the latest_event_record_date from the tally_state table and use it to query the collection of events that we have not processed since the last tally run. Once the tally has completed, this date is updated with the record_date of the last event we processed. Without an explicit lock on this tally_state record, the same event list could be processed simultaneously resulting in duplicate tally snapshot records.
NOTE:
Still needs to be confirmed, but, fixing the constraint issue described in SWATCH-3478 should cause tally to fail at the DB level due to a constraint violation, letting one process complete fully without duplicates. I'd rather NOT rely on this.
Potential Solution:
A potential solution would be to lock on the tally_state record until the events are fully processed. Not a huge fan of locking, but it might not be too bad since it would only ever produce a wait-state on tally processes targeting the same org. Which generally doesn't happen outside of testing.
Acceptance Criteria
- The potential cause is validated.
- The 'potential solution' is accepted and verified.
- Fix is implemented based on the solution.
- Applicable tests are written.
- is related to
-
SWATCH-3478 Correct tally_snapshot table constraints
-
- Backlog
-