About the Solution/Project:
As part of my learning on BOARD, I wanted to experiment with the BEAM predictive analytics engine. I've chosen the Population Growth dataset from datahub.io because it includes both historical population figures, and forward-looking projections. This dataset captures historical population growth by year and by region from 1950 to 2100. As a broad time-spectrum dataset, outliers should be irrelevant over time allowing more accurate projections, even towards a far time horizon. My hypothesis is BEAM should make comparable predictions to the manually created predictions included in the dataset. In the dataset, there are three series in particular that helped me test this hypothesis: Low Fertility, Medium Fertility and High Fertility. Each of these three series makes forward-looking population projections manually. I'm happy to report that BEAM does a great job of the linear extrapolation required to make comparable projections to these manual ones. I've included screenshots of key screens, as well as the BOARD database and capsule file for anyone to use. The datasets are freely available, and automatically loaded by ASCII datareaders sourced by http csv files. I've also posted a video walkthrough of the solution on YouTube. Population data is also a common divisor when calculating KPIs. If anyone needs population data to calculate KPIs, this may do the trick. If anyone has be considering applying BEAM to their own historical data to do a linear projection forward, this should serve as an example of how to do so.
Home - Shows a population growth trend from 1950 to 2100. Users can select a region and see how each of 4 predictive models compares to each other. The Estimated Actual series is our historical dataset ending in 2015. Low, Medium and High Fertility datasets are loaded data from the data.io Population Growth dataset. Smooth is a dataset output from BOARD BEAM. The BOARD BEAM prediction is well within the range of the high and low fertility datasets.
Data.IO Base Dataset - This shows a datagrid of the base dataset from data.io. Historical data goes from 1950 to 2015 and forecasted data goes from 2015 to 2100.
BEAM Estimate - This screen shows the populated output from the BEAM Predictive Analytics model. From the Estimated Actuals in the first data column, each model is populated for future years. The MASE, MAE and MAPE error information is also available.
BEAM Predictive Analytics - This screen shows the configuration of the scenario used to populate the model.
Cubes - Below is a listing of all cubes. Most cubes are (Year, Region) combinations. Only Latitude and Longitude are different. Region is the only non-standard entity, with 273 elements.
Datareaders - Each datareader loads a http ASCII file from the data.io server directly into BOARD.
Reload All Process - There is only one process, called Reload All. It cycles through each datareader to reload all cubes from the data.io server.
Takeaways/ Hints for other community Members:
- BEAM Predictive Scenarios do not need to be rerun when new data is loaded. Once is enough to generate a scenario.
- Accurate predictions can be made from even just one historical data series
- The various outputs from BEAM provide different options from which to choose
- Being able to test models against some kind of standard helps check if the predictions are reasonable or not
- Being able to see the different outputs from BEAM helps understand the variability in predictions given an input dataset
- ASCII datareaders reading files over http is a quick way to pull web data in, and keep it up to date
- Latitude and longitude data for countries is available here
- DataHub.IO has a number of datasets that can be pulled in easily