Probably The Best Alternative to CSV Storage: Parquet Data

1. What is your idea?

Board support a lot of datasources. Probably the most used are csv files. However, with ever increasing amounts of data, working with csv files becomes a pain. Parquet files can be an alternative.

Head-to-head comparison

CSV

Parquet

Row-based storage format.

A hybrid of Row-based and column-based storage formats.

It consumes a lot of space as no default compression option is available. For example, a 1TB file will occupy the same space when stored on Amazon S3 or any other cloud.

Compresses data while storing, thus consuming less space. A 1 TB file stored in Parquet format will take up only 130GB of space.

Query run time is slow because of the row-based search. For each column, every row of data has to be retrieved.

Query time is about 34 times faster because of the column-based storage and presence of metadata.

More data has to be scanned per query.

About 99% less data is scanned for the execution of the query, thus optimizing performance.

Most storage devices charge based on the storage space, so CSV format means the high storage cost.

Less storage cost as data is stored in compressed, encoded format.

File schema has to be either inferred (leading to errors) or supplied (tedious).

File schema is stored in the metadata.

The format is suitable for simple data types.

Parquet is suitable even for complex types like nested schemas, arrays, dictionaries.

Please add a new DataReader for Parquet-Files.

More information:

https://geekflare.com/parquet-csv-data-storage/


2. What specific problem are you trying to find a solution to, or what new scenario would this idea respond to?

Having a new DataSource for Board.

3. What workaround have you found and used so far (if any)?

No workaround available.

13
13 votes

Accepted · Last Updated

Comments

  • Like this - sounds interesting!

  • Product Management Team
    Product Management Team Employee, Group Leader
    First Comment 5 Likes Name Dropper Photogenic

    Hi @Nico Gerbrand , thank you for sharing your idea! We appreciate the time and effort you put into crafting your suggestion, and we understand the need of this solution. We have accepted this Idea to be part of our backlog and we are evaluating the feasibility of adding it to the development roadmap for 2024. 
     
    We are committed to delivering the best possible experience to our customers, and we must prioritize the features and enhancements that will have the most significant impact on your daily use of our software. 

  • Hello @Product Management Team,

    the wish Delta Lake for read/write parquet files could be helpful for your development of this wish.