Share on Facebook Share on Twitter Email
Answers.com

Mean squared error

 
Wikipedia: Mean squared error

In statistics, the mean square error or MSE of an estimator is one of many ways to quantify the difference between an estimator and the true value of the quantity being estimated. MSE is a risk function, corresponding to the expected value of the squared error loss or quadratic loss. MSE measures the average of the square of the "error." The error is the amount by which the estimator differs from the quantity to be estimated. The difference occurs because of randomness or because the estimator doesn't account for information that could produce a more accurate estimate.[1]

The MSE is the second moment (about the origin) of the error, and thus incorporates both the variance of the estimator and its bias. For an unbiased estimator, the MSE is the variance. Like the variance, MSE has the same unit of measurement as the square of the quantity being estimated. In an analogy to standard deviation, taking the square root of MSE yields the root mean squared error or RMSE, which has the same units as the quantity being estimated; for an unbiased estimator, the RMSE is the square root of the variance, known as the standard error.

Contents

Definition and basic properties

In statistics, mean squared error is used in two distinct senses: in estimation, and in residuals.

Estimation

The MSE of an estimator \hat{\theta} with respect to the estimated parameter θ is defined as

\operatorname{MSE}(\hat{\theta})=\operatorname{E}\big[(\hat{\theta}-\theta)^2\big].

The MSE is equal to the sum of the variance and the squared bias of the estimator

\operatorname{MSE}(\hat{\theta})=\operatorname{Var}(\hat{\theta})+ \left(\operatorname{Bias}(\hat{\theta},\theta)\right)^2.

The MSE thus assesses the quality of an estimator in terms of its variation and unbiasedness. Note that the MSE is not equivalent to the expected value of the absolute error.

Since MSE is an expectation, it is a scalar, and not a random variable. It may be a function of the unknown parameter θ, but it does not depend on any random quantities. However, when MSE is computed for a particular estimator of θ the true value of which is not known, it will be subject to estimation error. In a Bayesian sense, this means that there are cases in which it may be treated as a random variable.

Residuals

In a linear model and other regression models, the residuals, or estimated errors, are the differences between the observed data and fitted model, e_i = Y_i - \hat Y_i. The mean squared error is

 \frac{1}{n} \sum_{i=1}^n e_i^2

(the n in the denominator is often modified by a correction for degrees of freedom). In this case the MSE depends on data, and is a random variable. However, under certain conditions, this MSE is an estimate of the above MSE for the quantity θ = E[Y | X]. Thus, although the estimator may be subject to estimation error, it may still represent a scalar.

If the true errors have mean 0 and variance σ2, then the MSE is an estimate of σ2.

Examples

Suppose we have a random sample of size n from an identically distributed population, X_1,\dots,X_n.

Some commonly-used estimators of the true parameters of the population, μ and σ2, are[2] shown in the following table (see notes for distribution requirements for the MSEs in the table related to variance estimators).

True value Estimator Mean squared error
θ = μ \hat{\theta} = the unbiased estimator of the sample mean, \overline{X}=\frac{1}{n}\sum_{i=1}^n(X_i) \operatorname{MSE}(\overline{X})=\operatorname{E}((\overline{X}-\mu)^2)=\left(\frac{\sigma}{\sqrt{n}}\right)^2
θ = σ2 \hat{\theta} = the unbiased estimator of the sample variance, S^2_{n-1} = \frac{1}{n-1}\sum_{i=1}^n\left(X_i-\overline{X}\,\right)^2 \operatorname{MSE}(S^2_{n-1})=\operatorname{E}((S^2_{n-1}-\sigma^2)^2)=\frac{2}{n - 1}\sigma^4
θ = σ2 \hat{\theta} = the biased estimator of the sample variance, S^2_{n} = \frac{1}{n}\sum_{i=1}^n\left(X_i-\overline{X}\,\right)^2 \operatorname{MSE}(S^2_{n})=\operatorname{E}((S^2_{n}-\sigma^2)^2)=\frac{2n - 1}{n^2}\sigma^4
θ = σ2 \hat{\theta} = the biased estimator of the sample variance, S^2_{n+1} = \frac{1}{n+1}\sum_{i=1}^n\left(X_i-\overline{X}\,\right)^2 \operatorname{MSE}(S^2_{n+1})=\operatorname{E}((S^2_{n+1}-\sigma^2)^2)=\frac{2}{n + 1}\sigma^4

Note that:

  1. The MSEs shown for the variance estimators assume X_i \sim \operatorname{N}(\mu,\sigma^2) i.i.d. so that \frac{(n-1)S^2_{n-1}}{\sigma^2}\sim \chi^2_{n-1}. The result for S^2_{n-1} follows easily from the \chi^2_{n-1} variance that is 2n − 2.
  2. The general MSE expression for the unbiased variance estimator, without distribution assumptions, is \operatorname{MSE}(S^2_{n-1})= \frac{1}{n} [\mu_4-\frac{n-3}{n-1}\sigma^4], where μ4 is the fourth central moment.[3]
  3. Unbiased estimators may not produce estimates with the smallest total variation (as measured by MSE): S^2_{n-1}'s MSE is larger than S^2_{n+1}'s MSE.
  4. Estimators with the smallest total variation may produce biased estimates: S^2_{n+1} typically underestimates σ2 by \frac{2}{n}\sigma^2

Interpretation

An MSE of zero, meaning that the estimator \hat{\theta} predicts observations of the parameter θ with perfect accuracy, is the ideal and forms the basis for the least squares method of regression analysis.

While particular values of MSE other than zero are meaningless in and of themselves, they may be used for comparative purposes. Two or more statistical models may be compared using their MSEs as a measure of how well they explain a given set of observations: The unbiased model with the smallest MSE is generally interpreted as best explaining the variability in the observations.

Both Analysis of Variance and Linear Regression techniques estimate MSE as part of the analysis and use the estimated MSE to determine the statistical significance of the factors or predictors under study. The goal of Design of Experiments is to construct experiments in such a way that when the observations are analyzed, the MSE is close to zero relative to the magnitude of at least one of the estimated treatment effects.

MSE is also used in several stepwise regression techniques as part of the determination as to how many predictors from a candidate set to include in a model for a given set of observations.

Applications

  • Minimizing MSE is a key criterion in selection estimators. Among unbiased estimators, the minimal MSE is equivalent to minimizing the variance, and is obtained by the MVUE. However, a biased estimator may have lower MSE; see estimator bias.
  • In statistical modelling, the MSE is defined as the difference between the actual observations and the response predicted by the model and is used to determine whether the model does not fit the data or whether the model can be simplified by removing terms.

Criticism

Squared error loss is one of the most widely-used loss functions in statistics, though its widespread use stems more from mathematical convenience than considerations of actual loss in applications. Carl Friedrich Gauss, who introduced the use of mean squared error, was aware of its arbitrariness and was in agreement with objections to it on these grounds.[1] The mathematical benefits of mean squared error are particularly evident in its use at analyzing the performance of linear regression, as it allows one to partition the variation in a dataset into variation explained by the model and variation explained by randomness.

The use of mean squared error without question has been criticized by the decision theorist J.O. Berger. Mean squared error conflicts with most losses derived from utility functions; mean squared error is convex everywhere, whereas most losses derived from utility theory have concave tails (and may be concave everywhere). There are, however, some scenarios where mean squared error can serve as a good approximation to a loss function occurring naturally in an application.[4]

Like variance, mean squared error has the disadvantage of heavily weighting outliers.[5] This is a result of the squaring of each term, which effectively weights large errors more heavily than small ones. This property, undesirable in many applications, has led researchers to use alternatives such as the mean absolute error, or those based on the median.

References

  1. ^ a b Lehmann, E. L.; Casella, George (1998). Theory of Point Estimation (2nd ed.). New York: Springer. MR1639875. ISBN 0-387-98502-6. 
  2. ^ DeGroot, Morris H. (1980). Probability and Statistics (2nd ed.). Addison-Wesley. 
  3. ^ Mood, A.; Graybill, F.; Boes, D. (1974). Introduction to the Theory of Statistics (3rd ed.). McGraw-Hill. p. 229. 
  4. ^ Berger, James O. (1985). "2.4.2 Certain Standard Loss Functions". Statistical decision theory and Bayesian Analysis (2nd ed.). New York: Springer-Verlag. p. 60. MR0804611. ISBN 0-387-96098-8. 
  5. ^ Sergio Bermejo, Joan Cabestany "Oriented principal component analysis for large margin classifiers", Neural Networks, Vol. 14, No. 10, (Dec. 2001), pp. 1447-1461.

See also


Search unanswered questions...
Enter a question here...
Search: All sources Community Q&A Reference topics
Best of the Web: Mean squared error
Top

Some good "Mean squared error" pages on the web:


Math
mathworld.wolfram.com
 
 
 

 

Copyrights:

Wikipedia. This article is licensed under the Creative Commons Attribution/Share-Alike License. It uses material from the Wikipedia article "Mean squared error" Read more