1. Abstract
Sometimes as well as holding the current state of a hierarchy, it might be necessary to also retain a history of not just entity values but relationships as well.
2. Context
While it’s common to retain history of relationships in Board by breaking hierarchies into standalone entities and using matrices to track relationships over time, sometimes we want to structure entities as a hierarchy for easier use/reporting and by default to see the current state, but we also want to retain an archived history of how the same hierarchy has looked at set times in the past. In this guide we’ll look at how we can achieve this.
3. Content
3.1 Storing Multiple Versions of a Hierarchy
3.1.1 Why store multiple versions of a hierarchy?
The most common reason for wanting to store historical versions of a hierarchy is to be able to create an accurate historical picture of data – possibly for regulatory purposes or possibly just to put historical data into proper context.
Imagine we retain a history of past financial forecasts, we might store them at Cost Centre level to retain the same granularity they were planned at. Now this is fine in terms of retaining a granular picture of expected performance at that time, however if we try and report these historical figures against our current Cost Centre tree we might get a misleading picture of things – Cost Centres might have moved from one department to another or be under the control of a different manager for instance, and as such getting a proper picture of what was expected at that time might be challenging.
3.1.2 Using parallel hierarchies
In order to retain archived copies of our hierarchy we’re going to use a parallel copy of the same hierarchy, in which we can store multiple versions at once. Now let’s say our original tree looks like this:
Our archive tree is going to look something like this:
Basically our archive tree is a direct replica of the base tree with the addition of a Version entity, as we’re about to see this will help us track the different parallel versions of the structure we’re going to store to easily identify and retrieve specific versions.
One thing to note is that the code widths on the archive tree will need to be larger than the code widths of the base entities by 3 or 4 characters, depending on how many concurrent versions you’ll be holding.
3.1.3 Creating versions
At this point we want to create our version list as this should typically be a fixed list of values, depending on how many values you intend to store at one time populate your Version entity with zero padded sequential values from 1 to your chosen maximum, here I’m holding less than 100 concurrent versions so I only need a 2 digit version code. As we’ll see shortly it’s important to zero pad the numbers as it will make our lives a lot easier later on. Don’t worry too much about the descriptions as we’ll replace these later.
3.1.4 Populating the archive hierarchy
The initial population of the Archive hierarchy from the main hierarchy is actually going to be very simple, all we’re going to do is to extract the main tree to file and then create a data reader which takes that file and loads it into the Archive tree.
As we want to retain multiple versions of the hierarchy we need to add a version component to our keys, since the version we deploy from the main tree to the archive tree will always be version 1 we can simply use ETL to add “_01” to the end of each of our keys (this supposes you will be retaining a maximum of 99 versions of your hierarchy, if you need to retain more than this you might need to use “001” or similar). The screenshot below shows how we can easily add “_01” to the end of each of our keys and populate a basic description for the archive version which tells us when this version was archived.
Note that we leave the Version code as Discard New Item, since we pre-populated our Version codes we don’t need to append new values here, for all other fields we will want to append as we expect to have new values being created.
Populating the Archive Tree With the Extract of the Main Tree:
Using ETL to version Codes
3.1.5 Incrementing Versions
Now that we’ve set up our structure and populated our initial version, the next step is to set up a method of incrementing our versions so that we retain a history of multiple versions and don’t just keep on overwriting version 1.
The way in which we do this is actually quite straightforward, largely thanks to the fact that we zero padded our version numbers ensuring that they always have a fixed width (2 digits in our example, potentially more if you’ve elected to retain more than 99 concurrent versions).
What we’re going to do, prior to loading a new version 1, is to first extract the contents of our archive tree to file and then load it back in, as you can see from the screen shot below we then use ETL to increment the version numbers, meaning that version 1 becomes version 2 and so on. We use the fact that the version number is fixed width to easily strip the existing version number from the end of each key and replace it with the incremented version number.
Depending on the version limit we’ve set and the number of versions we’ve archived so far, once we reach the point where we need to discard the oldest archived version to make room for the new version 1, the ETL covers this by checking that the new incremented version number doesn’t exceed our set limit, in this example I’m storing a rolling 24 versions (cell D1).
As explained. extracting the archived tree:
And then loading the archived tree back in:
Using ETL to increment the version codes:
3.1.6 Running the Process
We now have all the components we require to increment our existing archive data to the next version and to then replace version 1 with a new archive of the main tree, the procedure should look something like this.
Once this procedure has run through we’ll have moved all of our existing archived versions on by one, discarded the oldest version (if necessary) and taken a fresh archive from the main tree, if we run this on a regular basis we’ll maintain a rolling history of how the tree has changed over time.
The last thing to consider is how often to run this process, really this depends on individual requirements but a good cadence in many cases is monthly, the more frequently you take these snapshots the more you’ll capture change history, but on the flip side the more quickly you’ll cycle these snapshots through and dispose of them, meaning you’ll either need to store a large number of versions (resulting in some very large max item numbers) or accept that your archive will go back over a shorter period of time than it might otherwise. If you maintain a Master Data Management solution or similar then the archiving can be tied into that process so that it’s part of a controlled and managed update of the tree in question.
3.2 Retrieving an Archived Version for Reporting
3.2.1 How to hold the restored archive
The first thing we need to consider when retrieving archived versions of our hierarchy for reporting is where we’ll store the retrieved data, for this we’ll want to make an addition to our main tree to serve this purpose.
Since our archived cost centres should always exist as cost centres in the main tree we can simply create a new branch as a parent of Cost Centre, the reason we include Historic Cost Centre as an entity despite the one to one relationship to Cost Centre, is because while the code will match the description may well have changed over time, and this allows us to report against the correct preserved description from when the snapshot was taken.
Below a picture to show how storing the restores historic tree as a parent of the main tree:
Holding the restored hierarchy in this way restricts us to restoring a single version at a time, but it means that we avoid needing to flow or replicate any of our data, and it also means that the ability to report against a historic version of the tree can be included by default in any existing reporting that uses the main tree, either by drilling by the new Historic entities, providing them as options in Quick Layout or by including them as filtering options.
3.2.2 How to Restore an Archived Version
Now that we have somewhere to store our restored archived version of the tree, we need to build the process to populate it from a selected archive version. Since the branch we’ve created is a direct replica of the archive tree, and since Cost Centre simply maps one to one with Archived Cost Centre, this process is actually very easy.
We’ll assume that the process of restoring an archived version is to be a user driven process and create an easy to use front end for the purpose, if you prefer to keep it as an admin function you can always use the same method but limit access to the screen we’ll create.
At a basic level we’ll want a new screen with a selector on our archive version and a button to run our procedure:
Now we’ll create our procedure to extract and restore the selected archive version, this will just require a few steps and should look something like this:
First we extract the selected archive version of the tree to file, next we clear down the Historic entities (this step is optional but avoids potentially having residual values from prior restores which we might not want), and finally we load the extracted archived version into the Historic entities via a data reader which should look something like this:
Using the ETL to join the historic tree to the main tree:
We use ETL to strip the version from the Cost Centre code, giving us the original Cost Centre code for mapping the Historic tree to the Cost Centre.
3.2.3 Ensuring Our Tree is Fully Mapped
As a general principle in Board we never want any parentless children in any hierarchy, this is discussed at greater length in other articles but suffice to say that it’s something to be avoided.
As such our final step should be to ensure that we map any new Cost Centres which didn’t exist in the archived version of the hierarchy either to the default _ Unknown, or to a specific set value if you prefer. We can do this quite simply by extracting our base tree and loading it back in using a Text File Data Reader, by setting all fields except Cost Centre to Append or Replace we ensure that any unmapped Cost Centres will be mapped to _ Unknown, if you prefer to use a specific custom default value instead this can be achieved with some simple ETL by checking if the codes are null, however it isn’t shown in this example.
For that purpose this is the additional step needed in the Restore Procedure:
Reloading the Historic Portion of the Cost Centre Tree:
3.2.4 Displaying the Outcome
The final step to complete our screen is to give visual confirmation to the user that the archived version of the tree has been restored, to do this we can simply take a suitable cube (in this case I’m just using sales figures) that’s dimensioned by Cost Centre and display it by our newly restored Historic entities.
3.3 Final Points
3.3.1 Reporting Against Multiple Archived Hierarchies Concurrently
The method described in this document restricts us to reporting against a single archived version of our hierarchy at a time, however it allows us to do so very quickly and simply and avoids the need for any large dataflows or duplication of data, in some cases though you might need to report against multiple versions simultaneously.
If you have a requirement to do so, it is possible in Board to achieve this, either using parallel cubes dimensioned by the Archive Cost Centre and a mapping cube between the two trees (this can be populated in the same data readers that populate and increment the archive tree), or by actually storing snapshots of your data against snapshots of your hierarchy at the point of archiving, however both of these are very different approaches to what’s described here and as such are covered by other guides.
3.3.2 Possible Additions to the Model
Some possible additions you might want to consider are a text cube which gets populated with details of the currently restored archive version, alternatively you could also add a Historic Hierarchy Version entity as a parent of Historic Cost Centre and restore the extracted Archived Org Structure Version into it so that it only contains that one value – the cube approach allows you to display it easily in a screen while the entity approach means the user can easily see it by checking the contents of the entity through screen select or a pager. Of course there’s nothing to stop you going with both, it just depends what suits your needs.
Another option you may wish to consider is an audit trail for either or both of the archiving process and the restoration process, these aren’t covered here but details of the methods required can be found in other articles.