Performance trade-offs : 128 bits sparse versus dense and 64bits sparse ?

Hi all,

everything is in the title: very often when designing cubes or cube versioning, I have a choice between these two kinds of structure:

- Entity A (D), Entity B (S), Entity C (S), Entity D (S), Entity E (S) => Smaller physical cube size but 128bits sparsity file.

- Entity A (D), Entity B (D), Entity C (S), Entity D (S), Entity E (S) => Bigger physical cube size but 64bits sparsity file.

 

What are the theoretical pros and cons of such structures ?

 

Thanks,

Etienne

Tagged:

Answers

  • Unknown
    Unknown Active Partner

    Hi Etienne CAUSSE,

       That's a very good question. I'm not sure if I completely understand though. If I understand correctly, we're discussing the tradeoffs between having more dense entities represented in a cube, and the bitlength storage required at each populated intersection. Is that right? 

     

        I'm guessing the context would be important though. In some cases, the first option would have better performance, and in others, the second would have better performance.  Along with pros and cons, could someone help me understand the context in which each approach is best?

  • Björn Reuber
    Björn Reuber Employee, Community Captain
    500 Likes Fourth Anniversary 100 Comments 5 Answers

    Hi,

     

    I always would suggest to use 64bit Sparsity (if possible), cause (from my point of view) it's faster. I only use 128bit Sparsity if I can't avoid it.

    So my suggestion would be (descending order)

    1) 64bit Sparsity / 64bit Density (cause normaly there are at least 1 dense entities)

    2) 128bit Sparsity / 64bit Density (this is normaly faster than 64bit Sparse/128bit dense)

    3) 128bit sparse / 128bit Dense (but I really try to avoid it)

     

    regards

    Björn

  • Hi Björn Reuber and thanks for your feedback.
    In your opinion, is there any difference related to the use, for example is it possible that a given structure will be faster in dataflows while another is faster in a report ?

    Or is it always the same performance impact ?

     

    Any formalized benchmark on this topic ?

  • Björn Reuber
    Björn Reuber Employee, Community Captain
    500 Likes Fourth Anniversary 100 Comments 5 Answers

    Hi,

     

    due to the fact that there are so many factors, which can affect perfomance (such as selection, security, is the sparsity shared between different cubes in the same layout/DataFlow, by Row Entity, etc) this question can't be answered with an simple answer (or an yes/no), cause it's more an accademic question.

     

    As a hint,  if you do performance optimization (cube versions etc), allways re-test your appliaction with an user account. Cause a common mistake is, that the developer optimize an appliaction for the Admin account, so everything is fast, cause a cube version can be used. but as soon as the users start to use the application, their performance experience is very bad (due to the fact, that the security was neglected for version creation and so cube version 1 is still in use)

     

    Regards

    Björn

  • Stephen Kirk
    Stephen Kirk Active Partner
    10 Comments Second Anniversary Level 200: Building A Planning Solution in Board Level 100: Foundations of Building in Board

    Hi Björn Reuber

    I was wondering how you set the 64bit/128bit for the sparsity or density?

  • Björn Reuber
    Björn Reuber Employee, Community Captain
    500 Likes Fourth Anniversary 100 Comments 5 Answers
    edited March 2020

    Hi Stephen Kirk

     

    when you create a new cube or cubeversion the 64 bit sparsity or density is displayed with an white background while 128bit got a colored background (please don't ask me about the exact color name ) 64bit Sparse - 64bit Dense - 128bit Sparse - 128bit Dense, as you can see on the screenshot. So if you add an element to an 64bit Sparsity/Density it might become an 128bit one, cause the structure is "to big" to adress it with 64bit.

     

    regards

    Björn

  • Hi Björn Reuber

    I am well aware of the real-life issues and everything that needs to be done to optimize the versioning, however I think it would be nice to have indeed some "academic" input on this question, to help in decision-making.

    In some cases our cubes are so large (> 10gb) that testing the versions is very heavy and so we can't afford to test every possibility.

     

    Anyway, thanks for your suggestion which confirms my intuition on the subject.
    Etienne