-
Story
-
Resolution: Done
-
Minor
-
None
-
None
-
False
-
False
-
We have some tech debt in Nise to fix the upload workflow for GCP after completing the following jira issue: https://issues.redhat.com/browse/COST-1587
In COST-1587 I modified bigquery to use pseudo column _PARTITIONTIME which helped us resolve some key issues for a customer. However, google is currently populating that column on its own as: "The _PARTITIONTIME pseudo column contains a date-based timestamp for data that is loaded into the table."
For example, if I logged into the GCP console and query a table that I generated with nise upload:
SELECT DISTINCT _PARTITIONTIME as pt FROM `{project_id}.{dataset}.{table_id}`; Row pt 1 2021-09-23 00:00:00 UTC
As you can see only one day is showing up in the return. Which means our downloader is technically only downloading "that" day.
Whenever we do a nise upload command we need to write the data to a specific partition in order to return the functionality in our testing infrastructure. More information on how to do this can be found here:
https://cloud.google.com/bigquery/docs/managing-partitioned-table-data#write-to-partition