What is Open Sparsity?

I've seen a few cases now where checking the Open Sparsity option on a dataflow does the trick to ensure data makes it to the destination cube correctly. I was hoping someone could help articulate when and why this should be used, or avoided?

As I understand it, the opensparsity option removes any assumptions about where valid data intersections exist in the target. This forces execution of the dataflow for each intersection of the target cube. If a non-zero value is found, it is saved to the target cube and the intersection is populated. Without this option, a dataflow could have a non-zero number to store in a target cube, but since the target intersection is not valid, the result cannot be saved to the target cube. The behaviour looks like the dataflow works fine, but some numbers just don't populate in the target. Without this option checked, the dataflow will only calculate results for each existing intersection of the target cube. With this option checked, a dataflow takes a bit longer because there are more intersections to calculate. Is that correct? Please feel free to correct me where I've misunderstood the option.

Find more posts tagged with

Dataflow

Accepted answers

All comments

Björn Reuber

sparsity is a basic concept in BOARD and it seems there is an small missunderstanding how sparisty is working. You can find some details in Introducing sparse structures , but I try to create an example.

Assuming you have 2 Entities (Customer, 10 Members, Product 5 Members) [I try only a 2 dimensional example, cause it's easier to explain].

A dense Cube would create a 5x10 Matrix to save all data, but due to the fact, that normaly not all cells have data a cube would need 50 cells, even if only some are filled

CubeDense1	C1	C2	C3	C4	C8	C10
P1	2					18
P2			45
P3
P4					12
P5		33		10

CubeDense2	C2	C4	C5	C8
P1	42
P2
P3			118
P4		61
P5		20		53

If you create a sparsity on Customer and Product, BOARD will save the data in a different way (one "table"/adressing Matrix for Sparsity and one for the elments) [The IDx is normaly not included, cause it's a list, so it's first element, second element, etc]. Furthermore a sparsity is shared between all cubes having the same Sparsity

ElementIDx SparsityOne	Customer	Product
1	C1	P1
2	C2	P1
3	C2	P5
4	C3	P2
5	C4	P4
6	C4	P5
7	C5	P3
8	C8	P4
9	C8	P5
10	C10	P1

SparseIDx	CubeSparse1
1	2
2
3	33
4	45
5
6	10
7
8	12
9
10	18

SparseIDx	CubeSparse2
1
2	42
3
4
5	61
6	20
7	118
8
9	53
10

So instead of having 2 Cubes with 50 Cells (100 Cells of Data) board is saving an Sparsity Mapping Table (10 Combinations-> 20 Cells), and 2 Cubes with 10 Cells, so in total 40 Cells instead of 100.

So, after this short introduction of sparsity back to your question.

A DataFlow can only work on existing Sparsity (it won't create a new entry in the sparse Table),

so for example if there would be third Dense Cube only containing Data for Cell C10 P5 and you want to write this value into a sparse cube, the value couldnt be written, cause our sparsity doesn'T know the combination C10 P5.

If you use a DF with opensparsity it would create a new entry in the Sparsity Mapping Table (Element 11) and so a sparse cube could also have data on this combination.

datareader and dataentry can directly create a new entry in the sparsity.

I hope this answers your question

Kindly Regards

Björn

unknown

Thank you Björn Reuber. That was very helpful!

Peter van Bennekom

Thank you Björn Reuber. Excellent explanation.

The opensparsity toggle is not always available. Not sure why, maybe you can give some pointers.

I have three cubes, same 4 dimensions; month (dense), item (sparse), area (sparse), uom (dense), same sparsity defined for all of them. According to your explanation they should all share the same addressing Matrix.

When i create a dataflow with two cubes with a simple b=a, the opensparsity toggle is available and i can toggle it on. So far so good.

When I create a DF with three of the cubes, however, with a simple c=a-b, the toggle is not available.

How can i model my cubes/DF such that i can use the Open Sparsity in more complex scenarios?

Or since they share the same addressing Matrix, they have the same value pairs available (then why would the simple case enable the toggle) so Open Sparsity is not needed?

Cheers,

Peter

Björn Reuber

Hi Peter van Bennekom

the answer is quite easy. Open Sparsity is only available in plain b=a Dataflows (The DataFlow process ):

"Open Sparsity" Advanced Option

The Dataflow is able to open new sparse combinations, if the following conditions are met:

- the Dataflow Layout has only two cubes, one in block (a) which is the source cube and one in block (b) which is the target cube,

- the algorithm must be b=a

- the source cube must have not use any Reference function such as Referto, TotalyBy.

- the source cube must have a sparse structure and the target cube must have a sparse structure which is made of the same entities plus one or two entities more (also in sparse).

Keep in mind, if the cubes a and b have the same structure/sparsity than c, an open sparsity is not needed (cause new sparse combinations can't be created, cause they already exist in cube A or Cube .

If they have different structures In your case you need 2 steps, first c=a+b (without creating new sparse combinations, so C is partly dense, according to Cube A and B structure ) and then b=a using a new sparse cube and the partly dense cube from step 1.

regards

Björn

Previous Member

Hi,

one of the condition is:

- the source cube must have a sparse structure and the target cube must have a sparse structure which is made of the same entities plus one or two entities more (also in sparse).

I have the following example where the open sparse does not work.

source cube structured by Year(D), E1 (X), E2 (X), E3 (X). The value in one crossing is +10.
target cube structuredby Month(D), E1 (X), E2 (X), E3 (X), E4(X). the value 10 is not written into the cube.

Could the time dimension be an issue to open sparsity ? I have Year in source and Month in the target.
Should I disable the "performance mode" of my dataflow to make "open sparsity" working ?
more generic question, if the Max Item Member of E4 is 100 and the "current real members number" of E4 is 5, will the "open sparsity" option write the values of the source cube into 5 crossings (i.e. replicate the source values 5 times) of the target cube (assuming no previous selections on E4) ?

thanks and regards,