AnsweredAssumed Answered

Dataflow processing time - what is happening under the hood?

Question asked by bbroughton on Apr 30, 2018
Latest reply on Apr 30, 2018 by bbroughton

Hi, apologies for this being a bit long but I really want to know what is happening 'under the hood' in this instance to make two almost identical dataflows take very different lengths of time to process. 

 

I discovered the following when looking to speed up a procedure (now that a client is finally on B10).  A dataflow was taking 2+ minutes so I was going to break it down to allow the use of 'high performance mode'. However another dataflow in the same procedure is using the same exact same source and target cubes, with an almost identical 'calculation' yet it only take about 20seconds...and if anything the faster calculation was the slightly more complex one.  So why the difference?

 

Each dataflow is a rather long algorithm to calculate either Chargeable or Non-Chargeable hours.

The target cube is the same and the source(s) are the same. There is a selection on the 'Chargeable flag' entity that changes between each data flow, and this is a dimension of the TARGET Cube only (not any source cube)

 

I've been trialling quite a few things to work out why the difference and I have determined it is from the selection on Chargeable Flag.

'YES' will select 84,663/85,303 of the child of this tree (entity: Project/Phase), whereas 'NO' will only select 640/85,303.

 

My question is why this makes a difference?  NONE of the source cubes are dimensioned by this tree at all, and the TARGET cube is ONLY dimensioned by CHARGEABLE FLAG, not by the more detailed Project/Phase entity.  If it was dimensioned by a more detailed entity I could understand. But as changing this selection makes NO difference to the volume of 'expected' calculations why does it take 5-6x longer to process the dataflow?

 

Why is this entity related to project/phase if this calculation doesn't require that level? This is due to actuals, actuals are recorded and reported at proj/phase level, and we want actual and budget to use the same 'chargeable flag' entity for reporting/variance/projection calculation purposes.

------

The layout is as below, with 'i' (not shown) being the target cube.

 

Layout dialog

 

The only difference between the two dataflows is the treatment of the 'Billable %', block 'c'

 

The non-chargeable calculates as;  (1-Billable%) = non-billable%

h*g*d*(1-c/100)*e/100

and the chargeable as;

h*g*d*(c/100)*e/100

 

The more complex 'non charageble' dataflow, which uses '1-c' takes about 20secs, whereas the more simple 'chargeable' dataflow takes 2 minutes longer

 

Log record of the 2 dataflows.

Outcomes