-
Story
-
Resolution: Unresolved
-
Undefined
-
None
-
None
-
False
-
-
False
-
-
Explore using python with a PoC as an alternative to the current SQL + postgres approach to fully replace trino and hive in both onprem and in the SaaS as the single solution. The PoC should follow this flow:
csv -> parquet -> python aggregation -> postgres DB inserts
This solution should be 1-1 parity with trino's current aggregation functionality for OCP and OCP on AWS.
The resulting PoC should include a benchmark report that details the results of using different payload sizes (number of rows for 1k, 10k 100k 500k and 1M) generated with nise to evaluate memory and aggregation time for both OCP and OCP on AWS.
Deliverables include:
- github repository with the PoC
- Technical documentation, including potential risks of this solution if adopted and benefits.
- Benchmark results as mentioned earlier.