Clear Cube Insight

Options
Center of Excellence
edited April 2023 in Best Practices

1.   Abstract

Clearing data in a Board application that has a large dataset, and high concurrency, needs to be done with care to avoid an unintended consequence. The approach you use to clear data has an impact on data retention and can impact performance. The purpose of this article is to review the options available for you.

2.   Context

These best practices are always applicable.

3.   Content

You can effortlessly clear a Cube in Board with a procedure called Clear Cube.

It’s suggested you always use the “Clear Cube” step with the "use current selection" option active. This applies the clearing action to the selection being used in your current procedure. Without this option enabled, the entire contents of a Cube is cleared, regardless of selections.

Be aware that this approach to clear a portion of a Cube doesn’t erase tuples* in the selection, but sets the related key figures to 0. This means that:

  • The sparsity structure keeps those tuples as a valid sparse combination
  • The Cube size is not reduced by the clearing action

It’s best practice to use the Current Selection option when performing the “Clear Cube” step because you are deleting content files from the Cube. Without using the Current Selection option, problems can arise if there is a high concurrency environment associated with these files. This action should be reserved as an admin task to be executed during periodic maintenance windows.

In situations with very large Cubes, where a large portion is cleared, we suggest running an extract Cube, clear Cube process—without the Current Selection option, and then reload the source content. This way, during the reloading process, all records with the value 0 will be discarded.

When your processes involve clearing large portions or frequently clearing portions of Cubes, using a Cube extract-clear-reload will help to optimize Cubes size and sparsity.

  

*A tuple is a sequence of any objects in parentheses, separated by commas. A Tuple can’t be changed, as it’s used to include fixed collections of items, for example:  (2.7, 3.8, 5) — this tuple contains three numeric objects.

Tagged:

Comments

  • Julien CARDON
    Options

    Hi,

    I have some consultants that are always adding a clear cube (use current selection) step before a dataflow step (because then in that case dataflow should be faster ?)

    While some other are just doing the dataflow without clear cube because they are saying that dataflow is already implictely launching a clear cube.

    Who is right ?

    Note : I'm talking about basic dataflow b=a with similar entities and not ones with calculation domain options.

    regards

    julien

  • Anastasia Pesyakova
    edited December 2023
    Options

    Hi Julien @Julien CARDON

    Clear cube b (use current selection) step is being added before the dataflow step to ensure that after the data flow execution, cube b will contain only records from cube a, and does not contain data from previous calculations.

    It is a good idea to add clear cube step when you have a lot of dimensions in cubes and data is spread across different members.

    Also please note data flow's calculation domain settings result in different calculation output and performance, please find more info n this in the manual: https://www.boardmanual.com/2023/summer/data-modeling/data-model-design-sections/procedures/procedure-actions/calculation-action-group/the-dataflow-process.htm?rhsearch=dataflow&rhhlterm=dataflow%20dataflows

    Thanks,

    Anastasia

  • Julien CARDON
    Options

    Hi @Anastasia Pesyakova

    Thanks for your feedback. However when I read the doc about dataflow, it is clearly explained that dataflow is already implicitely doing a clear cube ?

    So if I add a clear cube before, it seems to be redundant ?

    https://www.boardmanual.com/2023/summer/data-modeling/data-model-design-sections/procedures/procedure-actions/calculation-action-group/the-dataflow-process.htm?rhsearch=dataflow&rhhlterm=dataflow%20dataflows

    When the Dataflow is triggered in a Procedure, the following operations are performed:

    1. The target Cube is cleared based on the currently active Select, unless the target Cube is also referenced in the formula (i.e. it is a "factor Cube") 
    2. The values in the factor Cubes are processed for all Entities in the structure of the target Cube, at the highest level of granularity. 
      The formula is executed on each tuple in the calculation domain and the results are written in the target Cube.

  • Andrea Scazzosi
    Andrea Scazzosi Employee
    First Anniversary Level 100: Foundations of Building in Board Level 200: Leveraging Board for Business Insights 5 Likes
    edited December 2023
    Options

    Hi all, just to clarify.

    When the dataflow is executed without selections, the clear cube action is not necessary.

    If there are some selections before the dataflow and we need to clear the full content of the cube (I mean outside the selection domain), then is necessary to add a clear cube.

    Conclusion: the dataflow, by nature, overwrites any existing data in the domain/selection, so is not strictly necessary to add a clear cube action.

    Hope this helps.