Uploaded image for project: 'Open Data Hub'
  1. Open Data Hub
  2. ODH-447

Data Skipping in Spark JupiterHub notebooks

    XMLWordPrintable

Details

    • Story
    • Resolution: Done
    • Normal
    • None
    • None
    • odh-manifest

    Description

      Xskipper is An Extensible Data Skipping Framework, it provides a library for creating, managing and deploying data skipping indexes with Apache Spark to boosts performance and reduce cost by skipping over irrelevant data. It supports multiple data formats: Parquet, CSV, JSON, ORC and Avro.
      Hive tables are supported.
      Out of the box indexes supported include MinMax, ValueList and BloomFilter indexes, as well as data skipping for User Defined Functions.

      Adding Xskipper (https://xskipper.io) library to spark based Jupiter notebooks, by including the maven dependency in pyspark packages provides ODH users with native data skipping support in spark notebooks.

      Attachments

        Activity

          People

            rhn-support-jnakfour Juana Nakfour (Inactive)
            oshritf Oshrit Feder (Inactive)
            Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: