Econometric Sense: To Explain or Predict
source link: http://econometricsense.blogspot.com/2015/03/to-explain-or-predict.html
Go to the source link to view the article. You can view the picture content, updated content and better typesetting reading experience. If the link is broken, please click the button below to view the snapshot at that time.
Thursday, March 26, 2015
To Explain or Predict
Statist. Sci.
Volume 25, Number 3 (2010), 289-310.
"Statistical modeling is a powerful tool for developing and testing theories by way of causal explanation, prediction, and description. In many disciplines there is near-exclusive use of statistical modeling for causal explanation and the assumption that models with high explanatory power are inherently of high predictive power. Conflation between explanation and prediction is common, yet the distinction must be understood for progressing scientific knowledge. While this distinction has been recognized in the philosophy of science, the statistical literature lacks a thorough discussion of the many differences that arise in the process of modeling for an explanatory versus a predictive goal. The purpose of this article is to clarify the distinction between explanatory and predictive modeling, to discuss its sources, and to reveal the practical implications of the distinction to each step in the modeling process."
This is a nice article which I think complements Leo Brieman's paper discussed here before regarding two cultures of predictive modeling. Rob gives a nice synopsis of some of the main points from the paper:
- The AIC is better suited to model selection for prediction as it is asymptotically equivalent to leave-one-out cross-validation in regression, or one-step-cross-validation in time series. On the other hand, it might be argued that the BIC is better suited to model selection for explanation, as it is consistent.
- P-values are associated with explanation, not prediction. It makes little sense to use p-values to determine the variables in a model that is being used for prediction. (There are problems in using p-values for variable selection in any context, but that is a different issue.)
- Multicollinearity has a very different impact if your goal is prediction from when your goal is estimation. When predicting, multicollinearity is not really a problem provided the values of your predictors lie within the hyper-region of the predictors used when estimating the model.
- An ARIMA model has no explanatory use, but is great at short-term prediction.
- How to handle missing values in regression is different in a predictive context compared to an explanatory context. For example, when building an explanatory model, we could just use all the data for which we have complete observations (assuming there is no systematic nature to the missingness). But when predicting, you need to be able to predict using whatever data you have. So you might have to build several models, with different numbers of predictors, to allow for different variables being missing.
- Many statistics and econometrics textbooks fail to observe these distinctions. In fact, a lot of statisticians and econometricians are trained only in the explanation paradigm, with prediction an afterthought. That is unfortunate as most applied work these days requires predictive modelling, rather than explanatory modelling.
Rob also links Galit Shmueli's web page, (the author of the article above) who apparently has done some extensive research related to these distinctions. Lots of additional resources (blog) here in this regard. Galit states:
"My thesis is that statistical modeling, from the early stages of study
design and data collection to data usage and reporting, takes a
different path and leads to different results, depending on whether the
goal is predictive or explanatory."
I touched on these distinctions before, but did not realize the extent of the actual work being done in this area by Galit.
Analytics vs Causal Inference
Big Data: Don't throw the baby out with the bath water
See also: Paul Allison on multicollinearity
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.
Recommend
-
6
Monday, April 20, 2020 The Value of Business Experiments and the Knowledge Problem Why should firms leverage randomized business experiments? With recent advancements in comp...
-
7
Monday, April 6, 2020 Statistics is a Way of Thinking, Not Just a Box of Tools If you have taken very many statistics courses you may have gotten the impression that it's mos...
-
6
The Value of Business Experiments Part 2: A Behavioral Economic Perspective In my previous post I discussed the value proposit...
-
6
Sunday, July 15, 2018 The Credibility Revolution(s) in Econometrics and Epidemiology I've written before about the
-
6
The Value of Business Experiments Part 3: Innovation, Strategy, and Alignment In previous posts I have discussed the value proposition of business experiments from both a classical and behavioral economic perspective. This series...
-
7
Will there be a credibility revolution in data science and AI? Summary: Understanding where AI and automation are going to be the most disruptive to data scientists in the near term relates to understanding methodological diff...
-
14
Friday, May 2, 2014 Big Data: Don't Throw the Baby Out with the Bathwater "Data and algorithms alone will not fulfill the promises of “big data.” Instead, it is creative...
-
10
Applied Econometrics I really enjoy Marc Bellemare's applied econ posts, but I really enjoy his econometric related posts (for instance a while back he wrote some really nice posts related
-
6
Why Study Economics/Applied Economics?Applied Economics is a broad field of study covering many topics. Recognizing the wide range of applications has led departments of Agricultural Economics across numerous universities to change their degr...
-
10
Saturday, May 31, 2014 Big Data: Causality and Local Expertise Are Key in Agronomic Applications In a previous post
About Joyk
Aggregate valuable and interesting links.
Joyk means Joy of geeK