-
Story
-
Resolution: Done
-
Undefined
-
None
-
None
-
False
-
None
-
False
-
-
Right now our cloud function queries a 5 day window in GCP bigquery but does not have any sensible ordering. This is a potential problem if you have usage data in random orders spread across files.
Two things to consider:
- Ordering the queried data on usage dates so the batched files are more consistent.
- Improve our batching logic to not only batch by size but also by usage date!
Example:
- day1_p1, day1_p2, day1_p3
- day2_p1, day2_p2
- day3_p1