Uploaded image for project: 'Open Data Hub'
  1. Open Data Hub
  2. ODH-447

Data Skipping in Spark JupiterHub notebooks

XMLWordPrintable

    • Icon: Story Story
    • Resolution: Done
    • Icon: Normal Normal
    • None
    • None
    • odh-manifest

      Xskipper is An Extensible Data Skipping Framework, it provides a library for creating, managing and deploying data skipping indexes with Apache Spark to boosts performance and reduce cost by skipping over irrelevant data. It supports multiple data formats: Parquet, CSV, JSON, ORC and Avro.
      Hive tables are supported.
      Out of the box indexes supported include MinMax, ValueList and BloomFilter indexes, as well as data skipping for User Defined Functions.

      Adding Xskipper (https://xskipper.io) library to spark based Jupiter notebooks, by including the maven dependency in pyspark packages provides ODH users with native data skipping support in spark notebooks.

              rhn-support-jnakfour Juana Nakfour (Inactive)
              oshritf Oshrit Feder (Inactive)
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

                Created:
                Updated:
                Resolved: