Data Export Functionalities

Options
Center of Excellence
edited March 2023 in Best Practices

1.   Abstract

Describe all the alternatives available when you export a dataset from Board to integrate it with an external third-party environment or another Board data model. What are the pros and cons of the different solutions from file export to APIs?

2.   Context

Follow these best practices every time you have to integrate data contained within your Board data model with a third-party system or another Board data model. This article is intended to complete full guidance on exporting functionalities and their usage (with a detailed explanation of which and when to use them).

3.   Content

3.1 Extract to Flat File

Extract to a flat file is the “traditional” way to export data from Board. It gives flexibility with multiple options related to extraction functionalities like Extract Cube or Dataset and extraction paths.

These export options are executable as procedures and are limited by any active select applied.

3.1.1   Extract Cube

The “Extract Cube” exports the content of a Cube to a .CSV file. Board exports a file named after the physical name of the Cube and the extracted version to the specified extraction path (if the path is not present, the file will be exported in the standard path).

This functionality is used mainly when you need to:

  • Extract and reload a Cube due to some structural changes in the database (e.g., change of a max item number of an Entity) or for synchronization purposes between different databases
  • Check which data are stored in the Cube, and it’s easier to do it in an Excel file
  • Export of Cube’s content at the most granular level available

3.1.2   Export Data View to File and Export Dataset

The Export Data View to File feature provides a complete set of data using Board’s Layout capabilities. It allows data extraction to a .CSV file from layout executions, with multiple data blocks and dimensions set by row and columns to achieve your desired granularity. This functionality also overcomes the limitations of the Extract Cube step since it supports data extraction from multiple Cubes at once.

It should be used when you need to do more than simply export a Cube content, but instead when a specific report needs to be saved and analyzed in another format or when an aggregated dataset is enough, and the most granular level is not required.

The Export Data View to File feature can be slow with large datasets and when there are many Entities in the axes. Since it relies on the execution of a layout, it should be compared with the execution of it in a Screen; if Board isn’t able to execute the layout, the export functionalities won’t be either.

The Export Dataset is a procedural step that allows you to export data to a .CSV file much faster than the Extract Data View to File (formerly named "Export Layout to File") step. It overcomes the limitations of the Extract Cube step since it supports data extraction from multiple Cubes at once, even if it allows less flexibility than Export Data View to File. This functionality does not return totals or support Entities by column, Rules, sorting, filtering, or ranking functions. down

We recommend the Export dataset if you export massive amounts of data with a simple Layout configuration with multiple Entities set By Row.

Under the "Culture" menu, the desired regional settings can be applied to data in the output file (Choose "none" to apply the server default culture; the "profile default" setting applies the culture of the user running the procedure).

3.1.3   Extract Entity and Tree

A Cube’s content can be exported from Board’s data model in addition to primary data (Entities) and hierarchies (Trees).

  • By selecting the desired Entity, it’s possible to extract all the members of that Entity into a .CSV file.
  • By selecting the desired Tree, it’s possible to extract all the members and the relationships of a specific Tree into a .CSV file.

The .CSV file is exported in the specified download location (if the path is not present, the file will be exported in the standard path).

When extracting an Entity for reloading purposes, you should remove any “Sort by” customization and instead select sort it by “none.” This way, if the Entity is reloaded, the IDx codes related to the elements remain the same; otherwise, they could be mixed up.

3.1.4   Data Export Path

Exporting and reloading data is a common activity within a Board environment itself or between different environments.

If no file name and no path are entered, Board will automatically export a file named Export.txt to the default extraction path (C:\Board\Dataset\).

Every Board Cloud Platform has multiple local or cloud storage (C:\, D:\, R:\, Z:\...), and each of them has its own characteristics such as size, expiration, read/write speed, and accessibility, which is set to be ideal for specific purposes.

-      C:\ is the standard path for any data extraction to a flat file. Remember, C:\ drive is also where your data models are stored, and it’s suggested for small-size export

-      D:\ is a local disk similar to C:\ but is temporary and cleaned at every redeployment. It makes it perfect for large data extraction, Extract All Cubes, for instance, which will be automatically cleaned after the execution of the intended process

-      R:\ is an additional local disk similar to C:\, which can be provided with a request to Board Cloud Ops Team

-      Z:\ is a shared Azure storage provided for any Cloud customer. Remember, Z:\ is a shared folder between all your Board environments, which makes it perfect for exchange data (Cube content or Entity members) between environments. But it impacts performance since read/write  actions are slower than a local drive

3.1.5   Azure Storage

Only the Cloud-hosted environments can export data to their shared storage area (z:\) by entering the desired path or using the %bss% environment variable. If the entered path does not exist, Board will automatically create non-existent directories on the shared storage area.

  • The storage will be accessible from any federated Board Application, like a standard legacy hard drive
  • To access it from an on-premises network, you can use the AZCOPY tool (a command line tool) provided by Microsoft and a shared access signature (SAS) provided by Board Cloud Ops Team. You can copy data from one object to another within your storage account or between storage accounts
  • The SAS is a URI that has parameters to encompass all of the information necessary for authenticated access to a storage resource
  • To access storage resources with the SAS, you only need to pass the SAS to the command-line tool
  • All data are transferred using the HTTPS protocol.

 3.2 Export Dataset in the Board API Query

The high-performance Export Dataset action can also be executed through a Board Query REST API, replacing data extraction through files when other systems need to retrieve data from Board. 

Board REST API is a web service that uses HTTP to send data in JSON format using the GET call of the REST architecture. It allows you to configure and customize how the Board ecosystem exposes data and gets integrated into other third-party software environments. It is also possible for another Board’s Data Model to read from an API through our data pipeline and upload using the data reader protocol.

Once the initial configuration has been created, the external environment can access, read, and manipulate data from Board. 

For example, through HTTP requests, it’s possible to obtain lists of Entities, Cubes, Relationships, or Entity members in JSON format. It is possible to access Cube’s content by executing a Layout, searching for a Capsule, executing a Database procedure, and performing a full-text search.

As described for the Export Data View to File action, the use of Board API may be slow with large datasets and many Entities set in the Axes. Since it relies on the execution of a layout, it should be compared with the execution of it in a Screen; if Board cannot execute the layout, the export functionalities won’t execute either.

The Dataset option doesn't return down totals. It doesn’t support entities by column, sorting, filtering, and ranking functions, but it's significantly faster and recommended for large datasets with many Entities by row.

The Layout option returns a complete set of data using Board's Layout Object capabilities but may be slower with huge datasets and many Entities in the Axes.

 3.2.1   Types of APIs That Can Be Created Using Board

Using Board, we can create:

  • Machine-to-machine APIs: this can be used for automated execution and requires a license for Client API
  • User-to-machine APIs: this can be done by individual users from the presentation layers and relies on standard user login. The API add-in is only created for Power BI by Board. The Power BI Connector takes all the credentials from Board and the GET query. This is given in JSON format from Board to Power BI. You can do the same for other tools, such as Python and Postman--they follow a similar setup. You can make it work for any tool that can process API queries

 3.3 Bulk Insert to SQL Table

The Bulk Insert to SQL Table function allows a much faster data extraction from layout executions, even with many data blocks and many Entities set by row. It allows you to export that data to an external SQL database’s table.

This allows you to export massive amounts of records from different Cubes and quickly insert them into a SQL database. This export process relies on a layout definition and a SQL Command to insert records in the target table. The SQL database could be a target environment or could be seen as a staging layer where it integrates Board data with others.

Such functionality requires the definition of a new data source and a Data connection to a SQL source through the Board Data Pipeline. The user set up in the data source must have the rights to insert data on target tables.

This functionality can be configured in two different modes:

  • Default mode is compatible with Microsoft SQL Server DB only. It requires the user to provide an INSERT INTO syntax
  • Advanced mode gives more control over the insert statement structure, thus also supporting databases other than Microsoft SQL Server

Through Chunk Size, we can define the number of fields (columns) processed with a single server roundtrip for each insert statement. The higher the Chunk Size, the smaller the number of server roundtrips that the bulk will require overall. However, the payload size of each roundtrip will be heavy.

On the contrary, a low Chunk Size would process a small number of rows by roundtrip, resulting in a small payload size for each iteration but lower performance in terms of time.

This number is set at 1000 by default. If you change it, target database limits must be set accordingly.

 3.4 Push vs. Pull

Extract to Flat File and Bulk Insert actions are both Board’s procedure steps intended for a push from Board type of action, which can be either scheduled through a batch job or run on demand by a user. On the other side, Board APIs are a pull action triggered from outside the Board Platform to access its data by an external application or tool.

However, combining both functionalities allows a hybrid approach; for instance, Board APIs can run a procedure through a POST call, which executes an extract action from Board.

 4.   Conclusion

Below is a brief recap of the different data export functionalities: