-
Story
-
Resolution: Done
-
Major
-
None
-
-
COST-10 - New data architecture that includes Data Hub as big data pipeline
-
COST Sprint 49, COST Sprint 50, COST Sprint 51, COST Sprint 52
User Story
As a developer I want OpenShift on Cloud Infrastructure summarization to run efficiently so that we can better handle production workloads.
Assumptions
Right now the SQL pulls apart the daily tables by tag
e.g. https://github.com/project-koku/koku/blob/master/koku/masu/database/sql/reporting_ocpawscostlineitem_daily_summary.sql#L9-L76
See https://github.com/project-koku/koku/issues/1468
We already track the tag keys and values for each provider type: https://github.com/project-koku/koku/blob/master/koku/masu/database/sql/reporting_awstags_summary.sql
If we do https://github.com/project-koku/koku/issues/1367 and https://github.com/project-koku/koku/issues/1057 then the tag summary tables should be filterable by bill/report period and we can do a MUCH faster and simpler tag matching pair down using the tag summary tables JOINED ON tag key and value matching between OpenShift and infrastructure provider tag summary tables.
With the paired down list of matched fields we can then filter our starting data sets to include only the pre-matched tag key/values.
This operation currently is one of the slowest bottlenecks we could optimize.
- blocks
-
COST-101 Move OpenShift on AWS special tag handling to occur first
-
- Closed
-