XMLWordPrintable

    • Icon: Epic Epic
    • Resolution: Done
    • Icon: Major Major
    • None
    • None
    • None
    • None
    • Data Scaling
    • Done
    • COST-10 - New data architecture that includes Data Hub as big data pipeline
    • COST-10New data architecture that includes Data Hub as big data pipeline
    • 100
    • 100% 100%

      User Story

      As a user I want data ingestion, the UI, and APIs to run in a timely manner, so that I always have access to my data.

      As dev/ops I want to ensure we can scale our application to handle reports from many customers and ensure that data is returned via APIs efficiently.

      Prioritization / Business Case

      • We need to tackle scaling large amounts of data as number of customer on-boarding to use cost management grows.
      • We need to improve our ability to handle summarized data by partitioning it by time periods which will enable us to handle and present more data over time to users
      • Keeping the raw data and daily data in the DB over time will lead to large costs $$$ to run cost management, moving this data to S3 is more affordable and is a strategy for data export + potential aligns with use in big data engines like Presto.

      General Idea

      1. Have our report processors download/stream files directly when processing and switch worker statefulset -> deploymentconfig
      2. Table partitioning within existing schemas
      3. Big data processing setup (S3 bucket with parquet [and CSV for export]) during ingestion
      4. Spike on running presto (using what metering has done as a basis)
      5. After spike generate a plan for moving forward with new architecture

      Impacts

      • Data Backend
      • Docs (if we enable data export)

      Related Stories

      https://issues.redhat.com/projects/COST/issues/COST-8

      External Dependencies

      • May be simpler to have the platform on the S3 buckets for cost purposes (we can create these with app-interface)
      • May need more quota to run presto successfully in OpenShift (CPU/Mem per pod)

      UX Requirements

      • Is completion of a design/mock a prerequisite to working this epic or can portions be done concurrently?
        If we enable data export, application-level settings will need to be updated (this is likely just a switch).

      UI Requirements

      • Does the UI require a API contract with the backend so that UI could be developed prior to completing the API work? None

      Documentation Requirements

      • What documentation is required to complete this epic?
        We will need doc only if we deliver data export.

      Backend Requirements

      • Are there any prerequisites required before working this epic? No

      QE Requirements

      • Does QE need specific data or tooling to successfully test this epic? No

      Release Criteria

      • Can this epic be released as individual issues/tasks are completed? Yes
      • Can the backend be released without the frontend? Yes
      • Has QE approved this epic?

            aberglun@redhat.com Andrew Berglund
            aberglun@redhat.com Andrew Berglund
            Votes:
            0 Vote for this issue
            Watchers:
            12 Start watching this issue

              Created:
              Updated: