-
Bug
-
Resolution: Done
-
Normal
-
None
-
None
-
False
-
False
-
Undefined
-
Description
When running the downloader during the current month, we still end up downloading an empty CSV for the previous month, scanning for dates in the current month
In the follow example we query January from Feb 7 - 10 and end up with an empty CSV.
[2021-02-10 19:50:46,749] INFO e60e5986-d686-40ef-854b-6044f37bd1d0 Using querying for invoice_month (202101) [2021-02-10 19:50:47,910] INFO e60e5986-d686-40ef-854b-6044f37bd1d0 {'message': 'Local filename: 202101_389b53cf8c64902596eeb02bfcbe015e_2021-02-07:2021-02-10.csv', 'request_id': '722f61e7-76f1-4910-88ba-6a6438719b49', 'provider_uuid': '40871eab-6324-4f75-87e8-bb055aada66d', 'account': '10001'} [2021-02-10 19:50:47,912] INFO e60e5986-d686-40ef-854b-6044f37bd1d0 {'message': 'Downloading 202101_389b53cf8c64902596eeb02bfcbe015e_2021-02-07:2021-02-10.csv to /testing/pvc_dir/processing/acct10001/gcp/202101_389b53cf8c64902596eeb02bfcbe015e_2021-02-07:2021-02-10.csv', 'request_id': '722f61e7-76f1-4910-88ba-6a6438719b49', 'provider_uuid': '40871eab-6324-4f75-87e8-bb055aada66d', 'account': '10001'} [2021-02-10 19:50:48,726] INFO e60e5986-d686-40ef-854b-6044f37bd1d0 {'message': 'Returning full_file_path: /testing/pvc_dir/processing/acct10001/gcp/202101_389b53cf8c64902596eeb02bfcbe015e_2021-02-07:2021-02-10.csv', 'request_id': '722f61e7-76f1-4910-88ba-6a6438719b49', 'provider_uuid': '40871eab-6324-4f75-87e8-bb055aada66d', 'account': '10001'}
Proposed Solution
We can use the export_time column on the BigQuery table to determine if a previous month has updated data. We can update the etag generation method to query that column and build the etag based on that. Since we check the max of a column it uses the column statistics and is a very small and fast query.
Example
SELECT max(export_time) FROM `{table}` WHERE DATE(_PARTITIONTIME) >= '2021-01-01' AND DATE(_PARTITIONTIME) < '2021-02-01'
- is related to
-
COST-789 GCP Infrastructure Fit and Finish
- Closed