Performance trade-offs : 128 bits sparse versus dense and 64bits sparse ?

Options

Hi all,

everything is in the title: very often when designing cubes or cube versioning, I have a choice between these two kinds of structure:

- Entity A (D), Entity B (S), Entity C (S), Entity D (S), Entity E (S) => Smaller physical cube size but 128bits sparsity file.

- Entity A (D), Entity B (D), Entity C (S), Entity D (S), Entity E (S) => Bigger physical cube size but 64bits sparsity file.

 

What are the theoretical pros and cons of such structures ?

 

Thanks,

Etienne

Tagged:

Answers

  • Unknown
    Unknown Active Partner
    Options

    Hi Etienne CAUSSE,

       That's a very good question. I'm not sure if I completely understand though. If I understand correctly, we're discussing the tradeoffs between having more dense entities represented in a cube, and the bitlength storage required at each populated intersection. Is that right? 

     

        I'm guessing the context would be important though. In some cases, the first option would have better performance, and in others, the second would have better performance.  Along with pros and cons, could someone help me understand the context in which each approach is best?

  • Björn Reuber
    Options

    Hi,

     

    I always would suggest to use 64bit Sparsity (if possible), cause (from my point of view) it's faster. I only use 128bit Sparsity if I can't avoid it.

    So my suggestion would be (descending order)

    1) 64bit Sparsity / 64bit Density (cause normaly there are at least 1 dense entities)

    2) 128bit Sparsity / 64bit Density (this is normaly faster than 64bit Sparse/128bit dense)

    3) 128bit sparse / 128bit Dense (but I really try to avoid it)

     

    regards

    Björn

  • Etienne CAUSSE
    Options

    Hi Björn Reuber and thanks for your feedback.
    In your opinion, is there any difference related to the use, for example is it possible that a given structure will be faster in dataflows while another is faster in a report ?

    Or is it always the same performance impact ?

     

    Any formalized benchmark on this topic ?

  • Björn Reuber
    Options

    Hi,

     

    due to the fact that there are so many factors, which can affect perfomance (such as selection, security, is the sparsity shared between different cubes in the same layout/DataFlow, by Row Entity, etc) this question can't be answered with an simple answer (or an yes/no), cause it's more an accademic question.

     

    As a hint,  if you do performance optimization (cube versions etc), allways re-test your appliaction with an user account. Cause a common mistake is, that the developer optimize an appliaction for the Admin account, so everything is fast, cause a cube version can be used. but as soon as the users start to use the application, their performance experience is very bad (due to the fact, that the security was neglected for version creation and so cube version 1 is still in use)

     

    Regards

    Björn

  • Stephen Kirk
    Stephen Kirk Active Partner
    Level 200: Leveraging Board for Business Insights Level 100: Foundations of Building in Board First Comment 5 Up Votes
    Options

    Hi Björn Reuber

    I was wondering how you set the 64bit/128bit for the sparsity or density?

  • Björn Reuber
    Björn Reuber Employee
    Community Captain First Answer First Anniversary 5 Up Votes
    edited March 2020
    Options

    Hi Stephen Kirk

     

    when you create a new cube or cubeversion the 64 bit sparsity or density is displayed with an white background while 128bit got a colored background (please don't ask me about the exact color name ) 64bit Sparse - 64bit Dense - 128bit Sparse - 128bit Dense, as you can see on the screenshot. So if you add an element to an 64bit Sparsity/Density it might become an 128bit one, cause the structure is "to big" to adress it with 64bit.

     

    regards

    Björn

  • Etienne CAUSSE
    Options

    Hi Björn Reuber

    I am well aware of the real-life issues and everything that needs to be done to optimize the versioning, however I think it would be nice to have indeed some "academic" input on this question, to help in decision-making.

    In some cases our cubes are so large (> 10gb) that testing the versions is very heavy and so we can't afford to test every possibility.

     

    Anyway, thanks for your suggestion which confirms my intuition on the subject.
    Etienne