User Story
As developers we want cost management data stored in S3 and processed using a big data tool so that we can store data for longer periods and process large amounts of data more efficiently.
Prioritization / Business Case
- We have already done spike and PoC work toward this objective, this is just aiming for a complete solution where we process and store data using S3 and a big data tool
- Scale and process more customers
Out Of Scope
- Although this will enable longer term storage, this is just getting the infrastructure in place, not actually doing work for anything beyond 2 months of data in the API/UI
Impacts
- API
- Data Engineering
- Database
Related Stories
- Deploy Presto in OpenShift
- Run Presto within Docker-Compose
- Configure Presto + Hive with our S3
- Dynamically create Presto tables for S3 data
- Trigger summarization on S3 events
- Unit test generation flow
- Convert AWS Summary SQL to Presto
- Convert Azure Summary SQL to Presto
- Convert OpenShift Summary SQL to Presto
- Convert OpenShift on AWS Summary SQL to Presto
- Convert OpenShift on Azure Summary SQL to Presto
- Calculate cost model cost using Presto
- Load summarized data into Postgres
- Presto monitoring
External Dependencies
- OpenShift Quota
- Presto image from metering team (regular sync of version; like Python or Django)
Documentation Requirements
- We might want to document how we store users data and highlight steps taken to ensure security
Backend Requirements
- Are there any prerequisites required before working this epic? No
QE Requirements
- Does QE need specific data or tooling to successfully test this epic? S3 Bucket for ephemeral environments (just use path structure)?
Release Criteria
- Can this epic be released as individual issues/tasks are completed? Partially, infrastructure can be deployed but summarization should switch flows from Postgresql to Presto in a single deployment
- Can the backend be released without the frontend? Yes
- Has QE approved this epic? Yes