Extract Cubes in Parquet-File

Nico Gerbrand · 2023-03-03T15:23:06+00:00

There was an error rendering this rich post.

`1. What is your idea?`

Board has multiple steps to extract data from cubes:

Extract Cube
Export Data View to file
Export Dataset
Extract all cubes

All these steps save data as text or csv-files. We could save disk-space and increase performance by using parquet-files.

`Head-to-head comparison`

`CSV`	`Parquet`
`Row-based storage format.`	`A hybrid of Row-based and column-based storage formats.`
`It consumes a lot of space as no default compression option is available. For example, a 1TB file will occupy the same space when stored on` `Amazon S3` `or any other cloud.`	`Compresses data while storing, thus consuming less space. A 1 TB file stored in Parquet format will take up only 130GB of space.`
`Query run time is slow because of the row-based search. For each column, every row of data has to be retrieved.`	`Query time is about 34 times faster because of the column-based storage and presence of metadata.`
`More data has to be scanned per query.`	`About 99% less data is scanned for the execution of the query, thus` `optimizing performance.`
`Most storage devices charge based on the storage space, so CSV format means the high storage cost.`	`Less storage cost as data is stored in compressed, encoded format.`
`File schema has to be either inferred (leading to errors) or supplied (tedious).`	`File schema is stored in the metadata.`
`The format is suitable for simple data types.`	`Parquet is suitable even for complex types like nested schemas, arrays, dictionaries.`

Please add option to extract Data as parquet file.

More information:

https://geekflare.com/parquet-csv-data-storage/

2. What specific problem are you trying to find a solution to, or what new scenario would this idea respond to?

Having a new format to save data for Board.

3. What workaround have you found and used so far (if any)?

No workaround available.

Find more posts tagged with

Accepted

Database

Data Loading Strategies

Data Integration

Data Extract

Status: Accepted

Comments

Product Management Team

Hi @Nico Gerbrand , thank you for sharing another idea on Parquet-file format! We appreciate the time and effort you put into crafting your suggestion, and we understand the need of this solution. We have accepted this Idea to be part of our backlog and we are evaluating the feasibility of adding it to the development roadmap for 2024.

We are committed to delivering the best possible experience to our customers, and we must prioritize the features and enhancements that will have the most significant impact on your daily use of our software.