Extract Cubes in Parquet-File

1. What is your idea?

Board has multiple steps to extract data from cubes:

  • Extract Cube
  • Export Data View to file
  • Export Dataset
  • Extract all cubes

All these steps save data as text or csv-files. We could save disk-space and increase performance by using parquet-files.

Head-to-head comparison

CSV

Parquet

Row-based storage format.

A hybrid of Row-based and column-based storage formats.

It consumes a lot of space as no default compression option is available. For example, a 1TB file will occupy the same space when stored on Amazon S3 or any other cloud.

Compresses data while storing, thus consuming less space. A 1 TB file stored in Parquet format will take up only 130GB of space.

Query run time is slow because of the row-based search. For each column, every row of data has to be retrieved.

Query time is about 34 times faster because of the column-based storage and presence of metadata.

More data has to be scanned per query.

About 99% less data is scanned for the execution of the query, thus optimizing performance.

Most storage devices charge based on the storage space, so CSV format means the high storage cost.

Less storage cost as data is stored in compressed, encoded format.

File schema has to be either inferred (leading to errors) or supplied (tedious).

File schema is stored in the metadata.

The format is suitable for simple data types.

Parquet is suitable even for complex types like nested schemas, arrays, dictionaries.

Please add option to extract Data as parquet file.

More information:

https://geekflare.com/parquet-csv-data-storage/

2. What specific problem are you trying to find a solution to, or what new scenario would this idea respond to?

Having a new format to save data for Board.

3. What workaround have you found and used so far (if any)?

No workaround available.

10
10 votes

Accepted · Last Updated

Comments

  • Product Management Team
    Product Management Team Employee, Group Leader
    250 Likes 500 Comments First Anniversary Name Dropper

    Hi @Nico Gerbrand , thank you for sharing another idea on Parquet-file format! We appreciate the time and effort you put into crafting your suggestion, and we understand the need of this solution. We have accepted this Idea to be part of our backlog and we are evaluating the feasibility of adding it to the development roadmap for 2024. 
     
    We are committed to delivering the best possible experience to our customers, and we must prioritize the features and enhancements that will have the most significant impact on your daily use of our software.