-
Story
-
Resolution: Done
-
Normal
-
None
-
None
-
Open Data Hub Sprint 10, Open Data Hub Sprint 11, Open Data Hub Sprint 12, Open Data Hub Sprint 13, Open Data Hub Sprint 14, Open Data Hub Sprint 15
-
3
Xskipper is An Extensible Data Skipping Framework, it provides a library for creating, managing and deploying data skipping indexes with Apache Spark to boosts performance and reduce cost by skipping over irrelevant data. It supports multiple data formats: Parquet, CSV, JSON, ORC and Avro.
Hive tables are supported.
Out of the box indexes supported include MinMax, ValueList and BloomFilter indexes, as well as data skipping for User Defined Functions.
Adding Xskipper (https://xskipper.io) library to spark based Jupiter notebooks, by including the maven dependency in pyspark packages provides ODH users with native data skipping support in spark notebooks.