Relationship

Center of Excellence
edited March 2023 in Best Practices

1.   Abstract

Properly designing and maintaining relationships is essential for a well-functioning solution. In Board’s multi-dimensional database, relationships are a core component of the database configuration. Incorrectly configured relationships can make data partially or wholly inaccurate and lead to incorrect conclusions or behaviors.

2.   Context

Relationships in Board can be modeled in many ways, depending on the nature of the data being modeled and how it’s being used. Different approaches have pros and cons and significantly affect the database’s size and the complexity of managing those relationships.

3.   Content

When defining and maintaining trees and relationships in Board, there are certain practices which it’s advised to follow several important points to consider for your database to function correctly and give accurate outputs.

3.1 Designing trees and relationships

Correctly designing your trees and relationships is one of the first aspects of database design and should be given adequate consideration to avoid redesigning it in the future. Several considerations drive the design of your trees and relationships:

1)    What is the natural relationship of the data to be held (e.g., Town -> County -> Country, GL Account -> Summary Account -> Financial Statement)

2)    Which relationships are always going to be populated and which are only populated under certain circumstances (for example, Contract -> Customer might seem to be a natural relationship, but customers may exist in the system but haven’t yet signed a contract and are there because the company is planning to do business with them.

3)    Which levels will the users be doing the planning?

4)    Can trees be combined using an artificial key to simplify and reduce the size of your database by reducing the Max Item Number combinations in your Cubes?

3.1.1   Following natural relationships

The design of your trees and relationships should always be driven by the natural relationships in the data to be held. Sometimes these will be obvious (Town -> County -> Country), and others less. It’s important to ensure that all parent–child relationships follow a matching many-to-one relationship in the data. If done improperly, where a parent-child relationship is defined as many-to-many, it will result in incorrect aggregations and misleading numbers. If the data is being loaded from a file, it’s a good idea to check these issues by creating a pivot table. If the source is a SQL or other database, this can typically be checked with a suitable query.

3.1.2   How consistently are relationships populated?

Knowing how consistent the data set that you will load is essential. The user may optionally populate specific fields, and certain relationships may be time sensitive (for example, customers or products may be added into the system before any orders with them exist.) Certain fields are optionally populated; you may need to account for this in your ETL. Or you may need to consider an extract and reload step in your load procedure to ensure any missing relationships are filled in using the Not Available default value. Suppose customers exist (and are to be loaded) who haven’t yet done any business (perhaps because your organization wants to set future targets for them before closing a transaction with them). In that case, you may consider splitting the customer into a separate tree. Doing this will reflect that the relationship between the two isn’t as consistent as a single tree would imply.

Incomplete relationships, i.e., parentless children, must always be mapped to a value. If no valid parent exists currently, then the default Not Available should be used. If relationships are left incomplete, then Board won’t be able to correctly aggregate to or beyond the affected levels of the hierarchy. Incorrect results will be seen when reporting or selecting the Entities affected by the incomplete relationship.

3.1.3   Which levels does planning happen at?

Clearly define the granularity at which planning will take place when designing your database. It’s no good to design the database at a product-category level if there might be cases when a user needs to plan at an individual-product level. However, the consideration needs to go a little deeper than this.

Some relationships may be more optional than they first appear, and splitting them into multiple trees might be necessary. As a result, to ensure that planning can take place as required. Take the example of the customer who has yet to do any business with the company and has been loaded as a prospective client. It may be that you’ll need to load targets against these clients before you’ve done any business with them. Alternatively, you may wish to enrich the planning data with industry data as a comparison point, such as how much business a prospective client is believed to have done in the target sector in a prior financial year.

In both cases, the best solution would be to have the customer as a separate tree so that they can have data loaded against them regardless of whether they’ve done any business transactions. The alternative would be to artificially backfill the tree with dummy values at the child level to ensure a complete relationship. However, this results in pollution of the data in these lower levels of the tree and requires careful management. With these complications, it should be avoided unless no alternative exists.

3.2 Normalization

Where trees need to be loaded in more than one part— for example, a contract might be loaded with a parent such as ‘product,’ but the parents of that product (product category, manufacturer, etc.) might be loaded separately, possibly from a different source entirely – it is necessary to ensure that the correct normalization options are selected, and a normalize is run as part of the load process. Normalization will ensure that both direct and indirect relationships are properly defined.

3.2.1   Configuring normalization

Where one or more trees are loaded in multiple parts, as described above, it’s important that you configure the normalization options to confirm that the correct normalization path is selected. This will ensure that the required relationships are correctly formed. When multiple options are presented for the 'through option,’ when defining a normalization, the option selected should be the common factor between the two data loads. For instance, in the previous example, where an order is loaded with a product ID, and the product details are loaded separately, then the product ID is the common factor. The relationships of the ‘order tree,’ up to and including the product ID, have been loaded together, and all the relationships are therefore defined; similarly, the ‘product section’ of the tree has also been loaded at once, forming all of its relationships. We’re looking to combine the two by applying the relationship between the ‘order tree’ and the product ID to the rest of the ‘product tree’ by extending it. In some cases, it may be necessary to load a tree in more than two loads, in which case the same principles apply. Look at the data sets being loaded, look for any common data points between them, and determine from those the correct through points.

3.2.2   When and where to run a normalize

Essentially, a normalize needs to be run whenever the relationships in a tree have changed to propagate the changes through the tree properly. With the normalize step, you can avoid inconsistent and misleading results when running a query because an incorrect or out-of-date relationship is being used. In some cases, you may want to normalize a tree more than once if more than two loads were required to load a particular branch of the tree to ensure that all required relationships are correctly formed.

3.3 Reloading Trees

In some cases, it makes sense to extract and reload a tree to ensure that all fields are populated, even if just with the default ‘not available’ value. Ideally, such things should be addressed within the ETL of your load processes. However, where users are manually mapping relationships, for example, where they can create new products in a planning system and may neglect to map a parent relationship, extracting and reloading can be a helpful safeguard in some cases.

3.3.1   Extracting and Reloading Trees

There are only two significant questions to ask when extracting and reloading trees:

1) Has the tree first been fully normalized to ensure that the relationships are consistent?

2) Does the tree contain any free-text fields previously loaded via a method other than ASCII?

If the tree has been normalized, it should extract to a file without any issues, and the correct relationships should be used.

Suppose the tree contains one or more free-text fields previously loaded by a method other than ASCII. In that case, they may contain tabs, carriage returns, or other problematic blank-space characters, which could cause the file not to load or, in some cases, to load incorrect data, which could pollute your Entities with unwanted values. It’s advisable first to run a one-off extract and study the output file to see whether the exported data looks clean and, as always, to ensure that you have a fresh backup of your database to restore it, if needed, easily.

3.3.2   To Append or Not to Append?

When reloading a tree you’ve just extracted, you may wish to set some of the fields to append in order to populate them with the default ‘not available’ value where they’re currently unmapped. This can be a valuable safeguard for occasions where a field has been mistakenly left blank in the source system.

Setting fields to ‘discard new item,’ on the other hand, is a safeguard to ensure that what’s being loaded is what you’re expecting—for example, if rogue blank characters have thrown things out of sync as described previously. Remember that any fields set to ‘discard new item’ should come first in the data reader before any fields set to append. This is because Board will only reject a row once it reaches a field set to ‘discard new item’ and finds that it doesn’t match existing records. If fields set to append appear before it in the data reader, they will have already appended their contents to the relevant Entity before the row is rejected.

Comments