In
statistical analysis
Statistical inference is the process of using data analysis to infer properties of an underlying distribution of probability.Upton, G., Cook, I. (2008) ''Oxford Dictionary of Statistics'', OUP. . Inferential statistical analysis infers propertie ...
, Freedman's paradox, named after
David Freedman, is a problem in
model selection whereby
predictor variables with no relationship to the dependent variable can pass tests of significance – both individually via a t-test, and jointly via an F-test for the significance of the regression. Freedman demonstrated (through simulation and asymptotic calculation) that this is a common occurrence when the number of variables is similar to the number of data points.
Specifically, if the dependent variable and ''k'' regressors are independent normal variables, and there are ''n'' observations, then as ''k'' and ''n'' jointly go to infinity in the ratio ''k''/''n''=''ρ'',
# the ''R''
2 goes to ''ρ'',
# the F-statistic for the overall regression goes to 1.0, and
# the number of spuriously significant regressors goes to ''αk'' where α is the chosen critical probability (probability of Type I error for a regressor). This third result is intuitive because it says that the number of Type I errors equals the probability of a Type I error on an individual parameter times the number of parameters for which significance is tested.
More recently, new
information-theoretic estimators have been developed in an attempt to reduce this problem, in addition to the accompanying issue of model selection bias,
[Burnham, K. P., & Anderson, D. R. (2002). ''Model Selection and Multimodel Inference: A Practical-Theoretic Approach,'' 2nd ed. Springer-Verlag.] whereby estimators of predictor variables that have a weak relationship with the response variable are biased.
References
Regression variable selection
Statistical paradoxes
{{Statistics-stub