Share on Facebook Share on Twitter Email
Answers.com

Gauss–Markov theorem

 
Wikipedia: Gauss–Markov theorem

In statistics, the Gauss–Markov theorem, named after Carl Friedrich Gauss and Andrey Markov, states that in a linear model in which the errors have expectation zero and are uncorrelated and have equal variances, a best linear unbiased estimator (BLUE) of the coefficients is given by the ordinary least squares estimator. The errors are not assumed to be normally distributed, nor are they assumed to be independent (but only uncorrelated — a weaker condition), nor are they assumed to be identically distributed (but only having zero mean and equal variances).

Contents

Statement

Suppose we have

 Y_i=\sum_{j=1}^{K}\beta_j X_{ij}+\varepsilon_i

for i = 1, . . ., n, where β j are non-random but unobservable parameters, Xij are non-random and observable (called the "explanatory variables"), ε i are random , and so Y i are random. The random variables ε i are called the "errors" (not to be confused with "residuals"; see errors and residuals in statistics). Note that to include a constant in the model above, one can choose to include the XiK = 1.

The Gauss–Markov assumptions state that

  • E(\varepsilon_i)=0,
  • V(\varepsilon_i)= \sigma^2 < \infty,

(i.e., all errors have the same variance; that is "homoscedasticity"), and

  • {\rm cov}(\varepsilon_i,\varepsilon_j) = 0

for i ≠ j; that is "uncorrelatedness." A linear estimator of β j is a linear combination

\widehat\beta_j = c_{1j}Y_1+\cdots+c_{nj}Y_n

in which the coefficients cij are not allowed to depend on the earlier coefficients β, since those are not observable, but are allowed to depend on X, since this data is observable, and whose expected value remains β j even if the values of X change. (The dependence of the coefficients on X is typically nonlinear; the estimator is linear in Y and hence in ε which is random; that is why this is "linear" regression.) The estimator is unbiased if and only if

E(\widehat\beta_j)=\beta_j.\,

Now, let \sum_{j=1}^K\lambda_j\beta_j be some linear combination of the coefficients. Then the mean squared error of the corresponding estimation is defined as

E \left(\sum_{j=1}^K\lambda_j(\widehat\beta_j-\beta_j)^2\right)

i.e., it is the expectation of the square of the difference between the estimator and the parameter to be estimated. (The mean squared error of an estimator coincides with the estimator's variance if the estimator is unbiased; for biased estimators the mean squared error is the sum of the variance and the square of the bias.) A best linear unbiased estimator of β is the one with the smallest mean squared error for every linear combination λ. This is equivalent to the condition that

V(\tilde\beta)- V(\widehat\beta)

is a positive semi-definite matrix for every other linear unbiased estimator \tilde\beta.

The ordinary least squares estimator (OLS) is the function

\widehat\beta=(X^{T}X)^{-1}X^{T}Y

of Y and X that minimizes the sum of squares of residuals

\sum_{i=1}^n\left(Y_i-\widehat{Y}_i\right)^2=\sum_{i=1}^n\left(Y_i-\sum_{j=1}^K\widehat\beta_j X_{ij}\right)^2.

(It is easy to confuse the concept of error introduced early in this article, with this concept of residual. For an account of the differences and the relationship between them, see errors and residuals in statistics).

The theorem now states that the OLS estimator is a BLUE. The main idea of the proof is that the least-squares estimator is uncorrelated with every linear unbiased estimator of zero, i.e., with every linear combination a_1Y_1+\cdots+a_nY_n whose coefficients do not depend upon the unobservable β but whose expected value is always zero.

Proof

Let  \tilde\beta = CY be another linear estimator of β and let C be given by (X'X) − 1X' + D, where D is a k \times n nonzero matrix. The goal is to show that such an estimator has a larger variance than  \hat\beta , the OLS estimator.

The expectation of  \tilde\beta is:


\begin{align}
E(CY) &= E(((X'X)^{-1}X' + D)(X\beta + \varepsilon)) \\
&= ((X'X)^{-1}X' + D)X\beta + ((X'X)^{-1}X' + D)\underbrace{E(\varepsilon)}_0 \\
&= (X'X)^{-1}X'X\beta + DX\beta \\
&= (I_k + DX)\beta. \\
\end{align}

Therefore,  \tilde\beta is unbiased if and only if DX = 0.

The variance of  \tilde\beta is


\begin{align}
V(\tilde\beta) &= V(CY) = CV(Y)C' = \sigma^2 CC' \\
&= \sigma^2((X'X)^{-1}X' + D)(X(X'X)^{-1} + D') \\
&= \sigma^2((X'X)^{-1}X'X(X'X)^{-1} + (X'X)^{-1}X'D' + DX(X'X)^{-1} + DD') \\
&= \sigma^2(X'X)^{-1} + \sigma^2(X'X)^{-1} (\underbrace{DX}_{0})' + \sigma^2 \underbrace{DX}_{0} (X'X)^{-1} + \sigma^2DD' \\
&= \underbrace{\sigma^2(X'X)^{-1}}_{V(\hat\beta)} + \sigma^2DD'.
\end{align}

Since DD' is a positive semidefinite matrix,  V(\tilde\beta) exceeds  V(\hat\beta) by a positive semidefinite matrix.

Generalized least squares estimator

The GLS or Aitken estimator extends the Gauss-Markov Theorem to the case where the error vector has a non-scalar covariance matrix – the Aitken estimator is also a BLUE.[1]

See also

Other unbiased statistics

Notes

  1. ^ A. C. Aitken, "On Least Squares and Linear Combinations of Observations", Proceedings of the Royal Society of Edinburgh, 1935, vol. 55, pp. 42–48.

References

External links


Search unanswered questions...
Enter a question here...
Search: All sources Community Q&A Reference topics
 
 

 

Copyrights:

Wikipedia. This article is licensed under the Creative Commons Attribution/Share-Alike License. It uses material from the Wikipedia article "Gauss–Markov theorem" Read more