Entity Definition - Concatenation or not
1. Abstract
In complex applications, we often face a situation where we have too many dimensions structuring the cubes. In such cases, it is necessary to evaluate the impacts on the usability of the application, and to measure the risks on the overall performance. In case of performance issues, one of the possible optimizations is to reduce the number of dimensions by concatenating entities.
2. Context
In cases where it is necessary to dimension cubes by many different entities, one solution that might be considered is creating an artificial key to combine two entities.
Entity concatenation is not as simple as it may appear, it introduces to the application a high level of complexity, therefore we need to carefully study the feasibility and analyze all its drawbacks.
3. Content
3.1 When to use entity concatenation
The entity concatenation can be used as a strategy to reduce the number of dimensions in a cube to positively impact performance.
The good practice is to limit cubes to max 7 dimensions. This doesn’t mean that every time we have a cube with more than 7 dimensions we need to leverage the entity concatenation to reduce the number. This technique needs to be evaluated based on the context and based on the issue we are trying to solve.
For example:
One example of applicability could be in the scenario where the customer asks for a flat report with many details by row. In this case, the first approach should always suggest leveraging native Board features, like drill-down and filtering options, and avoiding similar reports. However, when such reports are strongly required, entity concatenation can be used as a strategy to reduce the number of entities by row to get the most granular level of information.
Flat report to be reproduced:
Without entity concatenation, the cubes have all the dimensions needed. The dimensions will be managed by row in the layout to obtain the flat report.
The main constraint is the impact on performance:
The dimensions of the report can be huge with a high risk of layout failure when having three and more dimensions by row.
If not failing, very bad performance when loading the screen. Depending on the volume it could take several minutes to execute the layout.
With the entity concatenation and leveraging the relationship approachit’s possible to reducethe number of dimensions in the cubes and the number of entities required by row to obtain the flat report.
By creating technical keys, we have fewer dimensions in the cubes with a positive impact on performance.
With the technical key we decreased the risk of falling into 128-bit sparsity. This is because the max item number is smaller than the product of the max item number of the entities that constitute the tech key.
Below an example of tech keys created to concatenate three entities:
- Tech key 1 is the concatenation of SKU, Customer and Activity.
- Tech key 2 is the concatenation of Company, Account and CC.
With that, it is possible to build the report as follows:
- Only the two technical dimensions are used by row, and all the other entities are used as blocks in the layout leveraging the parent-child relationship with the technical keys.
- With fewer entities by row, performances are improved even if the number of lines is the same as before.
3.2 How to concatenate entities
Below an example of technical key creation:
- Product tree exists containing many products.
- Sales Team tree also containing a significant number of teams.
In most of the cases, only certain teams are responsible for selling certain subsets of the total product catalogue. Combining the products sold by each team with the relevant teams might result in a fraction of the total possible combinations of the two.
This concatenation is following the business logic that not all the teams are responsible for all the products.
When creating a technical key, we must ensure that the number of elements in the tech key is lower than all the potential combinations of the tree elements.
Great care should be taken to ensure that the artificial key will always be unique, for example, if the two keys being combined are similar in their format (just numeric sequences for instance) a separator should be introduced between the two, for example where Team 123 sell product 45 and team 12 sell product 345 we should get two distinct keys 123-45 and 12-345, we should not store both under a single key 12345 as this will obviously merge the two data sets and give us incorrect aggregations. It’s good practice to introduce a separator logic, the suggestion is to use the “-“or ”_” and to avoid the use of “TAB” and “space”.
Comments
-
Thanks @Abdelhadi Babaali for the insights on this article!
5