Clear Cube Insight
1. Abstract
Clearing data in a Board application that has a large dataset, and high concurrency, needs to be done with care to avoid an unintended consequence. The approach you use to clear data has an impact on data retention and can impact performance. The purpose of this article is to review the options available for you.
2. Context
These best practices are always applicable.
3. Content
You can effortlessly clear a Cube in Board with a procedure called Clear Cube.
It’s suggested you always use the “Clear Cube” step with the "use current selection" option active. This applies the clearing action to the selection being used in your current procedure. Without this option enabled, the entire contents of a Cube is cleared, regardless of selections.
Be aware that this approach to clear a portion of a Cube doesn’t erase tuples* in the selection, but sets the related key figures to 0. This means that:
- The sparsity structure keeps those tuples as a valid sparse combination
- The Cube size is not reduced by the clearing action
It’s best practice to use the Current Selection option when performing the “Clear Cube” step because you are deleting content files from the Cube. Without using the Current Selection option, problems can arise if there is a high concurrency environment associated with these files. This action should be reserved as an admin task to be executed during periodic maintenance windows.
In situations with very large Cubes, where a large portion is cleared, we suggest running an extract Cube, clear Cube process—without the Current Selection option, and then reload the source content. This way, during the reloading process, all records with the value 0 will be discarded.
When your processes involve clearing large portions or frequently clearing portions of Cubes, using a Cube extract-clear-reload will help to optimize Cubes size and sparsity.
*A tuple is a sequence of any objects in parentheses, separated by commas. A Tuple can’t be changed, as it’s used to include fixed collections of items, for example: (2.7, 3.8, 5) — this tuple contains three numeric objects.
Comments
-
Hi,
I have some consultants that are always adding a clear cube (use current selection) step before a dataflow step (because then in that case dataflow should be faster ?)
While some other are just doing the dataflow without clear cube because they are saying that dataflow is already implictely launching a clear cube.
Who is right ?
Note : I'm talking about basic dataflow b=a with similar entities and not ones with calculation domain options.
regards
julien
0 -
Hi Julien @Julien CARDON
Clear cube b (use current selection) step is being added before the dataflow step to ensure that after the data flow execution, cube b will contain only records from cube a, and does not contain data from previous calculations.
It is a good idea to add clear cube step when you have a lot of dimensions in cubes and data is spread across different members.
Also please note data flow's calculation domain settings result in different calculation output and performance, please find more info n this in the manual:
Thanks,
Anastasia
1 -
Thanks for your feedback. However when I read the doc about dataflow, it is clearly explained that dataflow is already implicitely doing a clear cube ?
So if I add a clear cube before, it seems to be redundant ?
When the Dataflow is triggered in a Procedure, the following operations are performed:
- The target Cube is cleared based on the currently active Select, unless the target Cube is also referenced in the formula (i.e. it is a "factor Cube")
- The values in the factor Cubes are processed for all Entities in the structure of the target Cube, at the highest level of granularity.
The formula is executed on each tuple in the calculation domain and the results are written in the target Cube.
0 -
Hi all, just to clarify.
When the dataflow is executed without selections, the clear cube action is not necessary.
If there are some selections before the dataflow and we need to clear the full content of the cube (I mean outside the selection domain), then is necessary to add a clear cube.
Conclusion: the dataflow, by nature, overwrites any existing data in the domain/selection, so is not strictly necessary to add a clear cube action.
Hope this helps.
3