Managing high-volume entities

Options
Center of Excellence
edited April 2023 in Best Practices

1. Abstract

In complex environments, it happens to manage elements with a high volume. In such cases, it is crucial to identify the possible cascading effects on the rest of the model and what measures need to be taken when loading entities with millions of elements and evaluate in advance the possible impact on cubes, procedures and application usage.

2. Context

The customer requirement is to run a detailed analysis that includes a high-volume entity. Before accommodating the request is important to evaluate the possible alternatives to the load and be aware of all the impacts.

3. Content

Before considering the potential impact on the data model when huge volumes are loaded, it’s essential to understand if this action is needed to achieve the customer requirements. It’s always preferable to run the analysis and calculations at an aggregate level or filter the inbound information to limit the volume of data.

Clearly, this suggested approach is not always possible, but it makes sense to start the conversation in that terms to try to limit the complexity.

Below is an example of interaction with the customer to evaluate if the load of a high-volume entity is really the only option.

Once identified that the only option is to load the high-volume entity, please consider the following:

  • Cubes creation & master data management:
    • Avoid combining more than one high-volume entity in the same cube structure;
    • The high-volume entity should be included in the sparse structure.
    • Avoid setting the max item number as Auto but instead, estimate a realistic number; If it’s critical to define a realistic max item number, leave it as Auto but make it the only entity with the Auto max number assigned. In this way, Board will allocate all the available space to the growth of this entity.
    • Load only transactional data: load only occurrences with data assigned and not the entire data set coming from the Master data table – even if properly filtered;
    • Set up a consistent process to clean the sparsity periodically to keep the volume under control and clean from unused/out-of-date combinations.

  • The impact is different if the high-volume entity is needed for reporting purposes or if some complex or simple calculations need to be executed at that level. In both cases, consider the following:
    • Identify and/or create a parent entity, an aggregation, needed for executing the analysis or running the calculations.
    • If the high-volume entity is needed for reporting purposes:
      • Create reports that are at an aggregated level in the landing state and reach the details with a drill down/drill to screen approach – this is a way to guarantee that the information will be queried with predefined filters.
      • Make sure to always set the appropriate filters – directly on the screen/through procedures or leveraging the security – so that a subset of the detailed entity needs to be rendered on the screen Additional note: In case of end users have the right to modify the layout or introduce the high-volume entity within the axis through the quick layout functionality, it is possible to take advantage of the memory consumption optimization with the LayoutResponseMaxCells parameter to avoid unnoticed layout execution. The runaway query will protect the server to be saturated and prevent wider effects on the rest of the system which could be impacted by a lack of available RAM that lead to bad performances.
    • If the high-volume entity is needed in procedures for calculation purposes:
      • Prefer fixed selection on parent entities and avoid the use of select based on cube – this because the select based on cube has a limit on the number of members that can be selected (+20k). This limit is not fixed but dependent on the entity configuration such as the code length.
      • Simplify the calculation by segregating it into different steps keeping no more than 3 blocks for each dataflow
      • Align the cube dimensions before executing the calculation – better performances are expected using cubes that share the same spare structure.

  • In a complex situation where it is not possible to work on an aggregated level, evaluate the possibility of creating a multi-data model environment where the high-volume entity is segregated.For example, in a situation where we need to load millions of transactions – we can evaluate creating a data model for each retailer, grouping stores, dividing them into different data models to reduce the volume managed by the single. This is really an exceptional case and not the common scenario.

All these considerations are made to help the consultant to consider all the possible implications of managing entities with large volumes of data. The objective is not to give a process to be followed step by step regardless of the context.