Normalization in High Concurrency

1. Abstract

In Board’s multi-dimensional database, relationships are a core component of a data model configuration. Properly designing and maintaining relationships is essential for a well-functioning solution. Missing relationships, both direct or indirect, can lead to inaccurate or incorrect conclusions or behaviors. 

The Normalize button is a Board’s native feature that creates indirect relationships using direct relationships as a bridge. Using a datareader to read a whole tree Board will automatically normalize the hierarchy by defining both direct and indirect relationships and we can achieve the same result, a normalized tree. 

2. Context

Even if the Normalize feature in most cases is the preferred one, there are some cases where is necessary to find technical alternatives that allow the normalization of relationships to be managed without using the Normalize function, for instance using datareaders in order to normalize a tree. 

A typical situation could be having many end users that create new entity members frequently that must fit within e relationships with the consequent update of indirect relationships via Normalize. 

Since Normalize is a non-concurrent action, in case the frequency required by this action is very high, it could lead to queues in task manager and procedure execution interruptions. Such a situation happens especially when the hierarchy to be normalized is so large in terms of number of entities involved as well as number of levels and branches that the normalize process takes several seconds to be completed. 

3. Content

When we talk about normalization of relationships, we are talking about having at least 2 levels of aggregation of the leaf entity (case 1). Should one find oneself in the condition of modifying the relationship between a leaf element and its direct parent (case 2), without having any additional level of aggregation, then even in the On-Demand case the Normalize function will not be necessary. 

Within case 1 in the above image, should it be necessary to change on-demand one of the direct relationships (Leaf Entity vs Entity1, Entity1 vs Lev2, Lev2 vs Lev3), it is always necessary to apply a normalization of the indirect relationships. 

3.1 Create cubes 

Create a text cube in which the codes of the aggregating entity are contained per each parent entity. Cubes must be structured by the entity of the previous level: 

1. Code LEV1 (TXT) - Text cube

2. Code LEV2 (TXT) - Text cube

3 . Code LEV3 (TXT) - Text cube

3.2 Cubes feeding 

Text cubes should be populated from existing relationships and be updated whenever a master data get edited or uploaded from an external source occurs. 

This can be achieved by a procedure with an Extract Tree and multiple reloads through ASCII Data Readers: 

Each Data Reader loads the specific cube of the indicated level: 

The final result plotted on a single dataview will be as follows: 

As soon a user edits the relationship, for instance, between entity "Lev2" and entity "Lev3", he can do so through a dedicated dataview in which entity “Lev2” is in the row and the text cube “Code LEV3 (TXT)” is enabled for the data entry: 

We suggest foreseeing master data access segregation through a security selection and allocating a different subset of elements to different users in order to avoid concurrent modification on the same elements on the relationship. 

3.3 Tree normalization 

Finally, it is necessary to update the relationships using a procedure in which an Export Dataview to file is performed to recreate the tree on a flat file. Then, a subsequent ASCII Data Reader will directly update the relationship and normalize it at the same time. 

Below is the layout of the data view to be extracted: 

And the configuration of the ASCII Data Reader, with all entities in “Discard new item” mode, to update the relationships: 

4. CONCLUSION 

The final result of the relationships will be as follows: 

Attention points:  

  • To handle a high concurrency situation, it is important that where there is extraction or reading from files, this should always be done using the @User function on the file name. This will allow each user to work on personal files without risking modifying other users' files. 
  • For easy management of On-Demand relationships, you can enable the Suggested Value for a new Stand Alone similar to the main entities (e.g. Lev3 SA).
  • This solution is compatible with a scenario where just relationships between two hierarchy levels can be edited. In case of a scenario where multiple modification through the tree are required should foresee an iterative approach in order to reload and update the full relationships one level per each iteration.

Tagged:

Comments

  • Thanks, @Daniele Santandrea for the insights on this article!

  • Fethi ZERARA
    Fethi ZERARA Active Partner
    Community Captain First Anniversary Level 200: Leveraging Board for Business Insights Level 100: Foundations of Building in Board

    Very Insightful , Thank you @Daniele Santandrea !

  • Hello,

    Since Normalize is a non-concurrent action, in case the frequency required by this action is very high, it could lead to queues in task manager and procedure execution interruptions

    Thanks for the info, we are launching a lot of auto scheduled all the day "normalize" procedure in our DB because we have regularly new accounts or product in our tree.

    Normalize last about 1-2 second however we also have a lot of user launching several dataflows at the same time (complex budget process app with a lot of dataflows)

    We understand that it is a non concurrent action however it's not clear if there is impact on user dataflows when both are launched at the same time. Could you be more precise about the current behavior ?

    • If normalize is launched before dataflow then dataflow will wait for the end of normalize process then launch ?
    • Is there any stituation were the user procedure (with dataflows) could be stop in the middle of the procedure or the dataflow could fail?

    If I undertsand your recommandation, doing relations through SQL datareader is different from launching a normalize because datareader are high concurrency compatible while normalize is not ?

    Regards

    Julien

  • Daniele Santandrea
    edited November 2023

    Hi Julien,

    your assumption is correct.

    Respect to your questions, it may happen that a dataflow that is executed at the same time as the normalisation goes wrong and thus the procedure may not be completely executed.

    I confirm that with the use of the Data Reader (SQL or ASCII), as a concurrent action, the same behaviour would not occur.

    Regards,
    Daniele