Relationship - Scanning for Errors

Options
Center of Excellence
edited March 2023 in Best Practices

1.   Abstract

Properly designing and maintaining relationships is essential for a well-functioning solution. In Board’s multi-dimensional database, relationships are a core component of the database configuration. Incorrectly configured relationships can make data partially or wholly inaccurate and lead to incorrect conclusions or behaviors.

2.   Context

Even if you’ve applied best practices to your trees, relationships, data loads, and normalizations, it’s still a good idea to routinely check the health of your relationships using the analyze option.

3.   Content

The analyze option will quickly highlight any possible issues in trees and relationships and allow to assess any impact and possible remedial actions required.

3.1 Scanning for Errors

3.1.1   Children without Parents

Board data should never contain children without a parent. Once a data load has entirely run through, the presence of one or more orphans indicates that either

  • an ETL step is required to default the missing field, whether that be in the form of an ISNULL() in a SQL statement or a check in the ETL section of the data reader itself
  • a step to extract and reload the tree should be added, applying an append on that field to default it to not available where it’s been left null.

Orphan values should never be left unaddressed as they will cause aggregation issues and may cause errors, either when a query executes or when a procedure runs.

3.1.2   Parents without Children

The presence of parent values with no children can mean different things depending on whether a database is new or has been around for a while and whether the existence of childless parents is an intentional design decision.

3.1.2.1      Expected Childless Parents

Sometimes, a value may be created in an Entity that will only be used in certain, possibly limited circumstances. For example, flagging a record as requiring some sort of action either by a process or by direct user input, or where reporting accounts are created which are calculated via a rule and don’t have an associated GL account. Either way, if it’s intended that the value is normally not used as a parent and is only applied when needed, then its existence as a childless parent isn’t generally a cause for concern.

3.1.2.2      Childless Parents in Older Databases

If a database has been around for a long time, then the existence of childless parents can be a natural consequence of a continually moving data set. Suppose products were mapped to two categories, Food, and Beverages, but later the company decided to consolidate these two categories into one category called Food & Beverage. In that case, one or both of the original categories may become defunct and no longer have any Products mapped to them. Generally speaking, unless this action causes issues, it isn’t anything to worry about and should be considered normal behavior.

3.1.2.3      Childless Parents in Newer Databases

If a database is comparatively new but already has a significant number of childless parents, this may indicate that an issue exists with your relationships as they’re currently defined. Suppose a tree is mapped with ‘product’ mapped to ‘product category’ as the parent, but the underlying data actually allows a product to be mapped to multiple product categories. In that case, the result will be that the product ends up mapped to whichever product category the populating data reader pairs it with last. Any categories it was mapped to previously in the data set will be created but may end up with no children mapped to them if that’s the way the data is loaded. If your database is relatively new and you unexpectedly already have childless parents, consider the relationships you’ve defined and compare them to the underlying data to ensure that the implied uniqueness of relationships is reflected in the actual data.