| This article does not cite any references or sources. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed. (January 2007) |
DFFITS is a diagnostic meant to show how influential a point is in a statistical regression. It was proposed in the 1980 book Regression Diagnostics: Identifying Influential Data and Sources of Collinearity by David Belsley, Edwin Kuh, and Roy Welsch.[1] It is defined as the change ("DFFIT"), in the predicted value for a point, obtained when that point is left out of the regression, "Studentized" by dividing by the estimated standard deviation of the fit at that point:

where
and
are the prediction for point i with and without point i included in the regression, s(i) is the standard error estimated without the point in question, and hii is the leverage for the point.
DFFITS is very similar to the externally Studentized residual, and is in fact equal to the latter times
.[2]
Since when the errors are Gaussian the externally Studentized residual is distributed as Student's t (with a number of degrees of freedom equal to the number of residual degrees of freedom minus one), DFFITS for a particular point will be distributed according to this same Student's t distribution multiplied by the leverage factor
for that particular point. Thus, for low leverage points, DFFITS is expected to be small, whereas as the leverage goes to 1 the distribution of the DFFITS value widens infinitely.
For a perfectly balanced experimental design (such as a factorial design or balanced partial factorial design), the leverage for each point is p/n, the number of parameters divided by the number of points. This means that the DFFITS values will be distributed (in the Gaussian case) as
times a t variate. Therefore, the authors suggest investigating those points with DFFITS greater than
.
A similar measure of influence is Cook's distance.
References
- ^ Belsley, David A.; Edwin Kuh, Roy E. Welsch (c1980). Regression diagnostics : identifying influential data and sources of collinearity. Wiley series in probability and mathematical statistics. New York: John Wiley & Sons. ISBN 0471058564.
- ^ Montogomery, Douglas C.; Elizabeth A. Peck (c1992). "Appendix C.4" (in English). Introduction to Linear Regression Analysis (2nd ed. ed.). New York: John Wiley & Sons. pp. 504–505. ISBN 0-471-53387-4.
This entry is from Wikipedia, the leading user-contributed encyclopedia. It may not have been reviewed by professional editors (see full disclaimer)




