Data Transformation

Options
Center of Excellence
edited April 2023 in Best Practices

1.   Abstract

The data transformation process can be done in three different ways:

1.    with SQL transformations directly in the data source or in the Data Reader Object

2.    with the out-of-the-box Board ETL function

3.    after the data has been imported into Board by using the dataflow engine.

The type and number of data sources and the data transformation process can make one of the techniques more suitable than others.

2.   Content

The data transformation process includes all actions related to manipulating, adjusting, cleansing, and accommodating data to be used in reports, or planning and forecasting processes. We can identify three different data transformation layers where such a process can be carried out.

2.1 Relational Data Source

The first data transformation layer is the data source itself. Relational data sources are designed to store and organize the data in a structured way. Most of the time, they offer out-of-the-box functionalities to transform, cleanse, and prepare the data to be, imported and consumed by other systems. Such operations are performed with functions and queries specific to each data source technology.

Board is a CPM/BI tool that sits on top of data sources; you should aim to read data that requires as little data transformation as possible and expect all data transformation processes to occur in the data sources. This approach is recommended when Board reads data from relational data sources. Relational data sources allow the execution of complex and elaborate queries. They usually exploit the widely used SQL language.

2.2 ETL Function

Refine and adjust inbound data during the Board data reading process using the native Board ETL function. This data transformation layer is executed right after the data reading stage.

This approach is strongly suggested for the following cases:

-      When it’s not possible to perform data transformation in the data source

-      When there isn't a relational source available. The source data is stored in .CSV files.

-      Entities codes/relationships transformations

-      Quick and straightforward data transformations

When the data comes from a relational data source, it’s not recommended to mix SQL and Board ETL transformations. In this case, the data transformations should be done entirely in the data source with SQL queries.

2.3 Dataflow

Use the Board dataflow engine to perform data transformation and manipulation directly in Board after the data reading stage.

The dataflow engine allows you to perform very complex data transformations, giving you the flexibility to transform measures/figures. Nevertheless, it demands computational effort from Board, increasing the database’s complexity. The first two approaches should be conserved before following this approach. This approach is mandatory when part of the data/primary data is managed inside Board and does not come from a data source.


Comments