1. Abstract
Including multiple variables that measure the same underlying behavior can weaken model reliability and make results difficult to interpret. Managing correlation in the Workbench ensures models remain stable, explainable, and trustworthy.
2. Context
Use this best practice when building or refining a Workbench, especially before running Predict. Correlation issues often arise when multiple indicators track similar concepts such as income, wages, demand, or prices.
3. Content
3.1 Why It Matters
Highly correlated variables often reflect the same underlying behavior. Including many similarly shaped indicators in a Workbench can make it difficult to understand which variables are truly informative versus redundant.
When multiple indicators move together:
- It becomes harder to distinguish which variable best represents the underlying driver
- Insights become less intuitive, especially during exploration and explanation
- The Workbench becomes cluttered with overlapping signals rather than complementary ones
In addition, high correlation between variables increases the risk of multicollinearity once modeling begins. While a model may initially show strong fit, multicollinearity can lead to unstable coefficients, inconsistent signs, and results that are harder to trust or explain.
Designing the Workbench to emphasize distinct, complementary drivers helps ensure that each indicator adds meaningful insight and supports clearer analysis throughout the forecasting process.
3.2 How to Apply
- Open your Workbench and navigate to the Diagnostics tab.
- Review the correlation matrix or correlation table for explanatory indicators.
- Identify variable pairs with high absolute correlation (generally above ±0.9).
- For each highly correlated pair:
- Decide which indicator has stronger theoretical relevance
- Prefer the indicator with longer history or better data quality
- Remove or hide the redundant indicator from the Workbench.
- Re-run diagnostics to confirm reduced correlation.
- Document which variables were removed and why.
3.3 Example
An analyst finds that “Total Private – Average Hourly Earnings” and “Retail Trade – Average Hourly Earnings” have a correlation of 0.99. Because the company operates in retail, they retain “Retail Trade – Average Hourly Earnings” due to its stronger theoretical relevance and remove “Total Private – Average Hourly Earnings”, improving interpretability in the Workbench and coefficient stability in Predict.
3.4 Common Pitfalls
- Dropping variables solely based on correlation thresholds without business context
- Keeping all similar indicators “just in case”
- Ignoring lagged or lead-time correlations that may reveal different dynamics
- Forgetting to reassess correlation after adding new indicators
3.5 Expected Results
- Clearer attribution of driver impact
- More stable coefficients and residuals
- Easier explanation of model logic to stakeholders
- Improved forecast performance over time