Uploaded image for project: 'FlightPath'
  1. FlightPath
  2. FLPATH-2900

[Spike]: Explore a PoC using python to replace trino + hive datastore

XMLWordPrintable

      Explore using python with a PoC as an alternative to the current SQL + postgres approach to fully replace trino and hive in both onprem and in the SaaS as the single solution. The PoC should follow this flow:
      csv -> parquet -> python aggregation -> postgres DB inserts

      This solution should be 1-1 parity with trino's current aggregation functionality for OCP and OCP on AWS.

      The resulting PoC should include a benchmark report that details the results of using different payload sizes (number of rows for 1k, 10k 100k 500k and 1M) generated with nise to evaluate memory and aggregation time for both OCP and OCP on AWS.

      Deliverables include:

      • github repository with the PoC
      • Technical documentation, including potential risks of this solution if adopted and benefits.
      • Benchmark results as mentioned earlier.

       

              jgil@redhat.com Jordi Gil
              jgil@redhat.com Jordi Gil
              Chad Crum Chad Crum
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated: