Share on Facebook Share on Twitter Email
Answers.com

Goodness of fit

 
Sci-Tech Dictionary: goodness of fit
(¦gu̇d·nəs əv ′fit)

(statistics) The degree to which the observed frequencies of occurrence of events in an experiment correspond to the probabilities in a model of the experiment. Also known as best fit.


Search unanswered questions...
Enter a question here...
Search: All sources Community Q&A Reference topics
Accounting Dictionary: Goodness-Of-Fit
Top

Degree to which a model fits the observed data. In a Regression Analysis, the goodness-of-fit is measured by the Coefficient of Determination (r-squared).

Wikipedia: Goodness of fit
Top

The goodness of fit of a statistical model describes how well it fits a set of observations. Measures of goodness of fit typically summarize the discrepancy between observed values and the values expected under the model in question. Such measures can be used in statistical hypothesis testing, e.g. to test for normality of residuals, to test whether two samples are drawn from identical distributions (see Kolmogorov-Smirnov test), or whether outcome frequencies follow a specified distribution (see Pearson's chi-square test). In the analysis of variance, one of the components into which the variance is partitioned may be a lack-of-fit sum of squares.

Contents

Fit of distributions

In assessing whether a given distribution is suited to a data-set, the following tests and their underlying measures of fit can be used:

Regression analysis

In regression analysis, the following topics relate to goodness of fit:

Example

One way in which a measure of goodness of fit statistic can be constructed, in the case where the variance of the measurement error is known, is to construct a weighted sum of squared errors:

 \chi^2 = \sum {\frac{(O - E)^2}{\sigma^2}}

where σ2 is the known variance of the observation.[1] This definition is only useful when one has estimates for the error on the measurements, but it leads to a situation where a chi-square distribution can be used to test goodness of fit, provided that the errors can be assumed to have a normal distribution.

The reduced chi-squared statistic is simply the chi-squared divided by the number of degrees of freedom: [1] [2] [3] [4]

 \chi_\mathrm{red}^2 = \frac{\chi^2}{\nu} = \frac{1}{\nu} \sum {\frac{(O - E)^2}{\sigma^2}}

where ν is the number of degrees of freedom, usually given by Nn − 1, where N is the number of observations, and n is the number of fitted parameters, assuming that the mean value is an additional fitted parameter. The advantage of the reduced chi-squared is that it already normalizes for the number of data points and model complexity.

As a rule of thumb, a large \chi_\mathrm{red}^2 indicates a poor model fit. However \chi_\mathrm{red}^2 < 1 indicates that the model is 'over-fitting' the data (either the model is improperly fitting noise, or the error variance has been over-estimated). A \chi_\mathrm{red}^2 > 1 indicates that the fit has not fully captured the data (or that the error variance has been under-estimated). In principle a value of \chi_\mathrm{red}^2 = 1 indicates that the extent of the match between observations and estimates is in accord with the error variance.

Categorical data

The following are examples that arise in the context of categorical data.

Example 1

Pearson's chi-square test uses a measure of goodness of fit which is the sum of differences between observed and expected outcome frequencies (that is, counts of observations), each squared and divided by the expectation:

 \chi^2 = \sum_{i=1}^n {\frac{(O_i - E_i)}{E_i}^2}

where:

Oi = an observed frequency (ie count) for the ith bin
Ei = an expected (theoretical) frequency for the ith bin, asserted by the null hypothesis.

The resulting value can be compared to the chi-square distribution to determine the goodness of fit. In order to determine the degrees of freedom of the chi-squared distribution, one takes the total number of observed frequencies and subtracts one. For example, if there are eight different frequencies, one would compare to a chi-squared with seven degrees of freedom.

Binomial case

A binomial experiment is a sequence of independent trials in which the trials can result in one of two outcomes, success or failure. There are n trials each with probability of success, denoted by p. Provided that npi ≫ 1 for every i (where i = 1, 2, ..., k), then

 \chi^2 = \sum_{i=1}^{k} {\frac{(N_i - np_i)^2}{np_i}} = \sum_{\mathrm{all\ cells}}^{} {\frac{(\mathrm{O} - \mathrm{E})^2}{\mathrm{E}}}.

This has approximately a chi-squared distribution with k − 1 df. The fact that df = k − 1 is a consequence of the restriction  \sum N_i=n. We know there are k observed cell counts, however, once any k − 1 are known, the remaining one is uniquely determined. Basically, one can say, there are only k − 1 freely determined cell counts, thus df = k − 1.

Other measures of fit

The likelihood ratio test statistic is a measure of the goodness of fit of a model, judged by whether an expanded form of the model provides a substantially improved fit.

See also

References

  1. ^ a b Charlie Laub and Tonya L. Kuhl: Chi-Square Data Fitting. University California, Davis.
  2. ^ John Robert Taylor: An introduction to error analysis, page 268. University Science Books, 1997.
  3. ^ Kirkman, T.W.: Chi-Square Curve Fitting.
  4. ^ David M. Glover, William J. Jenkins, and Scott C. Doney: Least Squares and regression techniques, goodness of fit and tests, non-linear least squares techniques. Woods Hole Oceanographic Institute, 2008.

 
 

 

Copyrights:

Sci-Tech Dictionary. McGraw-Hill Dictionary of Scientific and Technical Terms. Copyright © 2003, 1994, 1989, 1984, 1978, 1976, 1974 by McGraw-Hill Companies, Inc. All rights reserved.  Read more
Accounting Dictionary. Dictionary of Accounting Terms. Copyright © 2005 by Barron's Educational Series, Inc. All rights reserved.  Read more
Wikipedia. This article is licensed under the Creative Commons Attribution/Share-Alike License. It uses material from the Wikipedia article "Goodness of fit" Read more