1. Abstract
During the creation of a new cube, the question is usually about whether dimensions should be dense or sparse. A cube can store a value for any combination of its dimension’s members: for example, a three-dimensional Cube (e.g. month, product and customer), has a cell for any triplet. If every customer purchases every product every month, we will have 100% dense data.
However, in a real situation, only a small fraction of the total Cube cells really contains data. The ratio between the actual number of cells containing data and the total number of cells of the Cube (obtained by multiplying the number of members of each dimension) is the Cube density.
2. Context
Sparsity plays a central role in cube’s sizing and dataflow’s layout’s execution speed, so are quite relevant for wide Data Model involving many dimensions and high data volume.
3. Content
A sparse structure is a combination of entities (not hierarchically related) for which the number of distinct combinations of values is minor compared to the total number of potential combinations. When a sparse structure is defined, disk space is allocated only for the sparse combinations created while loading, inputting, or calculating, therefore disk space overhead is minimal.
A sparsity is conceptually very similar to an index, so a pointer to data in a cube, that improves the speed of data retrieval operations on a board cube at the cost of additional writes and storage space to maintain the index data structure. Is important to evaluate the trade-off between index size and data retrieval optimization.
3.1 Sparsity Definition
When creating a cube (or a cube’s version) Board suggests us the definition of a dense/sparse structure, it will put as many entities as possible in sparse mode, thus trying to keep the sparse structure pointer to 64-bit based on the automatic or manual max item number set for each entity. In other words, Board will put all entities in sparse mode as long as the product of max item numbers stays below the 64-bit volume limit (=2^64, ~1.8*10^19). In the case of automatic max item numbers, affected entities will be considered as having a corresponding max item number greater than the current one while keeping the value below the 64-bit limit, so always keep limited the use of auto as max item number in order to optimize the sparsity sizing.
The automatic definition of the cube structures (sparse and dense dimensions) allows to create in an easy way the optimal cube structure, reducing the effort on the data model definition. The automatic configuration always try to define the sparse structure into the 64-bit array. If the 64-bit is not enough to address all the selected entities into the sparse, the system starts to switch the smallest entities in dense. The automatic configuration never creates a 128-bit cause is generally better to have a 64-bit. Using the 128-bit sparsity the trade-off between index size and data retrieval optimization is not balanced anymore.
The automatic optimal structure is defined based on the cube itself. An experienced board developer should be able to elaborate and deepen the analysis by considering other cubes structure involved in dataflow or dataview and manually review the sparsity to align it with similar cubes that are part of the same module and/or calculations. Overall performances are better if calculations and reports are executed with cubes that share similar sparsity.
When you create a Cube version, by clicking on the "EDIT SPARSITY" button, you can manually set entities as sparse\dense. Use the following guidelines to optimize a sparse structure:
- Dimensions that are large and sporadically combined within themselves should be kept as sparse. You should define a dimension as sparse whenever it has a considerable number of members compared to the other entities in the cube. Note that it is better to define a sparse structure when it is not needed than the opposite therefore when you are in doubt, set the entities as sparse.
- Dimensions that are small and always populated (Version, Scenario or Currency are typical examples) should be set as dense.These dimensions are usually filtered on screen and procedures select to show or work on a single element at a time.
When the size (number of members) of entities is not truly representing the real case (for example you have loaded a partial sub-set of data) then we suggest using a manual definition to instruct the system.
The actual size of a sparse structure is the number of distinct combinations of its entities. This number is visible from cubes impact analysis. Keep this number monitored and be ready to review the sparse structure whether it reaches many millions of combinations.
3.2 Attention Points
Since Sparsity is a shared object, you should keep your sparsities as similar as possible to reconcile all cubes to the same sparsity structure to optimize dataflow/dataview execution by involving fewer sparsities and reducing the creation of new one.
The density is a measure of how much sparsity is effective: a low level indicates that thanks to the Sparsity the Number of Cells is much lower than the total number of cells given by a non-sparse structure. A density close to 100 indicates that the sparsity is useless as much as it can affect database performance.
The sparse structure does not have any impact on the calculation domain of the dataflows, dataflows work only on the existing combination of the calculation domain and do not consider the existing combinations in the sparsity.
The main purpose of a sparse structure influences the performance (general execution time) but structure definition does not have an impact on the result of the execution of dataflows any longer.