Uploaded image for project: 'Cost Management'
  1. Cost Management
  2. COST-401

S3 Big Data Pipeline

XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Done
    • Icon: Major Major
    • 2021Q1
    • None
    • None
    • S3 Big Data Pipeline
    • Done
    • COST-10 - New data architecture that includes Data Hub as big data pipeline
    • COST-10New data architecture that includes Data Hub as big data pipeline
    • 0% To Do, 0% In Progress, 100% Done

      User Story

      As developers we want cost management data stored in S3 and processed using a big data tool so that we can store data for longer periods and process large amounts of data more efficiently.

      Prioritization / Business Case

      • We have already done spike and PoC work toward this objective, this is just aiming for a complete solution where we process and store data using S3 and a big data tool
      • Scale and process more customers

      Out Of Scope

      • Although this will enable longer term storage, this is just getting the infrastructure in place, not actually doing work for anything beyond 2 months of data in the API/UI

      Impacts

      • API
      • Data Engineering
      • Database

      Related Stories

      • Deploy Presto in OpenShift
      • Run Presto within Docker-Compose
      • Configure Presto + Hive with our S3
      • Dynamically create Presto tables for S3 data
      • Trigger summarization on S3 events
      • Unit test generation flow
      • Convert AWS Summary SQL to Presto
      • Convert Azure Summary SQL to Presto
      • Convert OpenShift Summary SQL to Presto
      • Convert OpenShift on AWS Summary SQL to Presto
      • Convert OpenShift on Azure Summary SQL to Presto
      • Calculate cost model cost using Presto
      • Load summarized data into Postgres
      • Presto monitoring

      External Dependencies

      • OpenShift Quota
      • Presto image from metering team (regular sync of version; like Python or Django)

      Documentation Requirements

      • We might want to document how we store users data and highlight steps taken to ensure security

      Backend Requirements

      • Are there any prerequisites required before working this epic? No

      QE Requirements

      • Does QE need specific data or tooling to successfully test this epic? S3 Bucket for ephemeral environments (just use path structure)?

      Release Criteria

      • Can this epic be released as individual issues/tasks are completed? Partially, infrastructure can be deployed but summarization should switch flows from Postgresql to Presto in a single deployment
      • Can the backend be released without the frontend? Yes
      • Has QE approved this epic? Yes

              Unassigned Unassigned
              chambrid Chris Hambridge
              Votes:
              0 Vote for this issue
              Watchers:
              8 Start watching this issue

                Created:
                Updated:
                Resolved: