1. Abstract
Including too many explanatory variables in a model can reduce clarity, increase instability, and weaken long-term performance. Thoughtfully limiting variables helps preserve interpretability, strengthen generalization, and improve stakeholder trust.
2. Context
Apply this best practice when configuring Predict, reviewing candidate models, or refining an existing model that includes many explanatory indicators.
3. Content
3.1 Why It Matters
Adding more variables often increases model fit but not always model quality. Excessive explanatory variables can:
- Increase multicollinearity
- Reduce interpretability
- Create unstable or contradictory coefficients
- Increase sensitivity to minor data changes
- Encourage overfitting to historical noise
A model that explains slightly less variance but uses fewer, well-understood drivers is often more robust and easier to defend.
Strong models balance completeness with parsimony: including enough variables to capture meaningful drivers, but not so many that interpretation becomes unclear.
3.2 How to Apply
When reviewing Predict results:
- Review the number of explanatory variables in top models.
Compare candidate models with varying driver counts. As a general rule of thumb, aim for at least 10 observations per explanatory variable.
With the recommended minimum 60 months of historical data, the most statistically stable models will typically include six or fewer explanatory variables. - Evaluate each variable’s contribution.
Ask:- Is this driver statistically significant?
- Does it have clear business relevance?
- Does it meaningfully improve model performance?
- Remove marginal contributors.
If removing a variable does not materially reduce performance metrics, consider excluding it. - Assess interpretability.
Can you clearly explain each included driver and its expected effect? - Re-run diagnostics after simplification.
Confirm stability of coefficients and residual behavior.
3.3 Example
A demand model includes 12 explanatory variables and achieves a Model Score of 0.89. A simplified version using 5 core drivers achieves a score of 0.87. The team selects the 5-driver model because it is easier to interpret and more stable under scenario testing.
3.4 Common Pitfalls
- Selecting models solely based on highest Model Score
- Adding drivers simply because they correlate
- Retaining redundant indicators that represent the same concept
- Assuming more complexity equals better performance
3.5 Expected Results
- Clearer and more defensible models
- Greater stakeholder confidence
- Reduced multicollinearity risk
- Improved out-of-sample stability
- Faster model maintenance and recalibration