Dataflow processing time - what is happening under the hood?

Hi, apologies for this being a bit long but I really want to know what is happening 'under the hood' in this instance to make two almost identical dataflows take very different lengths of time to process.

I discovered the following when looking to speed up a procedure (now that a client is finally on B10). A dataflow was taking 2+ minutes so I was going to break it down to allow the use of 'high performance mode'. However another dataflow in the same procedure is using the same exact same source and target cubes, with an almost identical 'calculation' yet it only take about 20seconds...and if anything the faster calculation was the slightly more complex one. So why the difference?

Each dataflow is a rather long algorithm to calculate either Chargeable or Non-Chargeable hours.

The target cube is the same and the source(s) are the same. There is a selection on the 'Chargeable flag' entity that changes between each data flow, and this is a dimension of the TARGET Cube only (not any source cube)

I've been trialling quite a few things to work out why the difference and I have determined it is from the selection on Chargeable Flag.

'YES' will select 84,663/85,303 of the child of this tree (entity: Project/Phase), whereas 'NO' will only select 640/85,303.

My question is why this makes a difference? NONE of the source cubes are dimensioned by this tree at all, and the TARGET cube is ONLY dimensioned by CHARGEABLE FLAG, not by the more detailed Project/Phase entity. If it was dimensioned by a more detailed entity I could understand. But as changing this selection makes NO difference to the volume of 'expected' calculations why does it take 5-6x longer to process the dataflow?

Why is this entity related to project/phase if this calculation doesn't require that level? This is due to actuals, actuals are recorded and reported at proj/phase level, and we want actual and budget to use the same 'chargeable flag' entity for reporting/variance/projection calculation purposes.

------

The layout is as below, with 'i' (not shown) being the target cube.

Layout dialog

The only difference between the two dataflows is the treatment of the 'Billable %', block 'c'

The non-chargeable calculates as; (1-Billable%) = non-billable%

h*g*d*(1-c/100)*e/100

and the chargeable as;

h*g*d*(c/100)*e/100

The more complex 'non charageble' dataflow, which uses '1-c' takes about 20secs, whereas the more simple 'chargeable' dataflow takes 2 minutes longer

Log record of the 2 dataflows.

Find more posts tagged with

Usage

Accepted answers

All comments

Helmut Heimann

Hi Brendan Broughton,

you might want to have a look into this document here: The DataFlow process --as it describes the "golden rules" of DataFlows.

The answer to your question is that the source cubes will be aligned to the structure of the target--in your case this means aligning 84,663 values vs aligning 640 values. Guess which process will be faster.

So the difference in runtime results from a different number of values to be aligned to the target before the calculation takes place.

Since this is only an explanation, it won't give you a direct hint to how to make the process faster. I'd assume a little bit of re-modelling your data model.

Kind Regards,

Helmut

Brendan Broughton

Thanks Helmut,

But it seems there is no explanation. I continued testing different things and it seems to possibly have just been misbehaving (again). Without changing anything in my structures of dataflows the 'slow' action is now only taking 20sec.

Also, I'm not sure if you understood the structure of the target cube, but it is not structured by the entity containing 85,303 members, it is structured by an indirectly related entity which only consists of 2 members (called Chargeable Flag) this is the entity on which I am selecting, and the only level of alignment that it should do. If it was aligning to the more detailed and non-required level then any DB with day and year would have issues with processing time.