Freedman's Paradox
   HOME

TheInfoList



OR:

In
statistical analysis Statistical inference is the process of using data analysis to infer properties of an underlying probability distribution.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers properties of ...
, Freedman's paradox, named after David Freedman, is a problem in
model selection Model selection is the task of selecting a model from among various candidates on the basis of performance criterion to choose the best one. In the context of machine learning and more generally statistical analysis, this may be the selection of ...
whereby predictor variables with no relationship to the dependent variable can pass tests of significance – both individually via a t-test, and jointly via an F-test for the significance of the regression. Freedman demonstrated (through simulation and asymptotic calculation) that this is a common occurrence when the number of variables is similar to the number of data points. Specifically, if the dependent variable and ''k'' regressors are independent normal variables, and there are ''n'' observations, then as ''k'' and ''n'' jointly go to infinity in the ratio ''k''/''n''=''ρ'', # the ''R''2 goes to ''ρ'', # the F-statistic for the overall regression goes to 1.0, and # the number of spuriously significant regressors goes to ''αk'' where α is the chosen critical probability (probability of Type I error for a regressor). This third result is intuitive because it says that the number of Type I errors equals the probability of a Type I error on an individual parameter times the number of parameters for which significance is tested. More recently, new
information-theoretic Information theory is the mathematical study of the quantification, storage, and communication of information. The field was established and formalized by Claude Shannon in the 1940s, though early contributions were made in the 1920s through ...
estimators have been developed in an attempt to reduce this problem, in addition to the accompanying issue of model selection bias,Burnham, K. P., & Anderson, D. R. (2002). ''Model Selection and Multimodel Inference: A Practical-Theoretic Approach,'' 2nd ed. Springer-Verlag. whereby estimators of predictor variables that have a weak relationship with the response variable are biased.


References

Eponymous paradoxes Regression variable selection Statistical paradoxes {{Statistics-stub