DFFIT and DFFITS ("difference in fit(s)") are diagnostics meant to show how influential a point is in a

statistical regression Statistics (from German: ''Statistik'', "description of a state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a scientific, industri ...

, first proposed in 1980. DFFIT is the change in the predicted value for a point, obtained when that point is left out of the regression: :

\text = \widehat - \widehat

where

\widehat

and

\widehat

are the prediction for point ''i'' with and without point ''i'' included in the regression. DFFITS is the Studentized DFFIT, where Studentization is achieved by dividing by the estimated standard deviation of the fit at that point: :

\text =

where

s_

is the standard error estimated without the point in question, and

h_

is the

leverage Leverage or leveraged may refer to: *Leverage (mechanics), mechanical advantage achieved by using a lever * ''Leverage'' (album), a 2012 album by Lyriel *Leverage (dance), a type of dance connection *Leverage (finance), using given resources to ...

for the point. DFFITS also equals the products of the externally

Studentized residual In statistics, a studentized residual is the quotient resulting from the division of a residual by an estimate of its standard deviation. It is a form of a Student's ''t''-statistic, with the estimate of error varying between points. This is ...

(

t_

) and the leverage factor (

\sqrt

): :

\text = t_ \sqrt

Thus, for low leverage points, DFFITS is expected to be small, whereas as the leverage goes to 1 the distribution of the DFFITS value widens infinitely. For a perfectly balanced experimental design (such as a

factorial design In statistics, a full factorial experiment is an experiment whose design consists of two or more factors, each with discrete possible values or "levels", and whose experimental units take on all possible combinations of these levels across all ...

or balanced partial factorial design), the leverage for each point is p/n, the number of parameters divided by the number of points. This means that the DFFITS values will be distributed (in the Gaussian case) as

\sqrt \approx \sqrt

times a t variate. Therefore, the authors suggest investigating those points with DFFITS greater than

2\sqrt

. Although the raw values resulting from the equations are different,

Cook's distance In statistics, Cook's distance or Cook's ''D'' is a commonly used estimate of the influence of a data point when performing a least-squares regression analysis. In a practical ordinary least squares analysis, Cook's distance can be used in several ...

and DFFITS are conceptually identical and there is a closed-form formula to convert one value to the other.

Development

Previously when assessing a dataset before running a linear regression, the possibility of outliers would be assessed using histograms and scatterplots. Both methods of assessing data points were subjective and there was little way of knowing how much leverage each potential outlier had on the results data. This led to a variety of quantitative measures, including DFFIT,

DFBETA In statistics, an influential observation is an observation for a statistical calculation whose deletion from the dataset would noticeably change the result of the calculation. In particular, in regression analysis an influential observation is ...

References

{{Reflist Regression diagnostics