Share on Facebook Share on Twitter Email
Answers.com

Generalized method of moments

 
Wikipedia: Generalized method of moments

The generalized method of moments is a very general statistical method for obtaining estimates of parameters of statistical models. It is a generalization, developed by Lars Peter Hansen, of the method of moments.

The term GMM is very popular among econometricians but is hardly used at all outside of economics, where the slightly more general term estimating equations is preferred.

Contents

Introduction

A typical econometric problem can be formulated in the following terms: Suppose available data consists of a large number of i.i.d. observations \{Y_t\}_{t=1}^T, where each observation Yt is an n-dimensional multivariate random variable. Our knowledge (or lack thereof) of economics dictates a certain econometric model for this data. Such model is usually defined only up to some parameter, which we will denote by \theta\in\Theta. Our main goal is to seek the “true” value of this parameter, θ0, or at least to find a reasonably close estimate.

In order to apply GMM there should exist a (possibly vector-valued) function g(Y,θ) such that

m(\theta_0) \equiv \operatorname{E}[\,g(Y_t,\theta_0)\,]=0,

where E denotes expectation, and Yt is just a generic observation, which are all assumed to be i.i.d. Moreover, function m(θ) must not be equal to zero for \theta\neq\theta_0, or otherwise parameter θ will not be identified.

The basic idea behind GMM is to replace theoretical expected value E with its empirical analog — sample average:

\hat{m}(\theta) = \hat\operatorname{E}\big[\,g(Y_t,\theta)\,\big] \equiv \frac{1}{T}\sum_{t=1}^T g(Y_t,\theta)

By the Law of Large Numbers, \hat{m}(\theta) \approx m(\theta) for large values of T, so if we can find a number \hat\theta such that \hat{m}(\hat\theta)\approx 0 then such number will be a reasonably good estimate for parameter θ0. So basically all we need to do is to search the parameter space Θ for a number θ which would minimize the distance between \hat{m}(\theta) and zero (or in other words the norm of \hat{m}(\theta)). For a vector-valued function \hat{m} the notion of norm can be defined in many different ways, and it actually turns out that the obvious Euclidean norm is not the best choice. Instead a new positive semi-definite “weighting” matrix \hat{W}_T is often used which is used to define the norm as a quadratic form \lVert \hat{m} \rVert = \hat{m}'\hat{W}_T\hat{m}, where ′ denotes matrix transposition. Thus, GMM estimator can be written as

\hat\theta = \operatorname{arg}\min_{\theta\in\Theta} \bigg(\frac{1}{T}\sum_{t=1}^T g(Y_t,\theta)\bigg)' \hat{W}_T \bigg(\frac{1}{T}\sum_{t=1}^T g(Y_t,\theta)\bigg)

Under suitable conditions this estimator is consistent, asymptotically normal, and with right choice of weighting matrix \hat{W}_T asymptotically efficient.

Consistency

Consistency is the most important property of an estimator. It means that having sufficient number of observations, the estimator will get arbitrarily close to the true value of parameter:

\hat\theta \xrightarrow{p} \theta_0\ \text{as}\ T\to\infty

Necessary and sufficient conditions for GMM estimator to be consistent are following:

  1. \hat{W}_T \xrightarrow{p} W, where W is a positive semi-definite matrix,
  2. \,W\operatorname{E}[\,g(Y_t,\theta)\,]=0  only for \,\theta=\theta_0,
  3. \theta_0\in\Theta, which is compact,
  4. \,g(Y,\theta)  is continuous at each θ with probability one,
  5. \operatorname{E}[\,\textstyle\sup_{\theta\in\Theta} \lVert g(Y,\theta)\rVert\,]<\infty.

Second condition here (so-called Global identification condition) is often particularly hard to verify. There exist simpler necessary but not sufficient conditions, which may be used to detect non-identification problem:

  • Order condition. The dimension of moment function m(θ) should be at least as large as the dimension of parameter vector θ.
  • Local identification. If g(Y,θ) is continuously differentiable in a neighborhood of θ0, then matrix W\operatorname{E}[\nabla_\theta g(Y_t,\theta_0)] must have full column rank.

In practice applied econometricians often simply assume that global identification holds, without actually proving it.[1]

Asymptotic normality

Asymptotic normality is a useful property, as it allows us to construct confidence bands for the estimator, and conduct different tests. Before we can state asymptotic distribution of GMM estimator, we need to define two auxiliary matrices:

G = \operatorname{E}[\,\nabla_{\!\theta}\,g(Y_t,\theta_0)\,], \qquad 
        \Omega = \operatorname{E}[\,g(Y_t,\theta_0)g(Y_t,\theta_0)'\,]

Then under conditions 1–6 listed below, GMM estimator will be asymptotically normal with limiting distribution

\sqrt{T}\big(\hat\theta - \theta_0\big)\ \xrightarrow{d}\ \mathcal{N}\big[0, (G'WG)^{-1}G'W\Omega WG(G'WG)^{-1}\big]

Conditions:

  1. \hat\theta is consistent (see previous section),
  2. \,\theta_0 lies in the interior of set \,\Theta,
  3. \,g(Y,\theta) is continuously differentiable in some neighborhood N of θ0 with probability one,
  4. \operatorname{E}[\,\lVert g(Y_t,\theta) \rVert^2\,]<\infty,
  5. \operatorname{E}[\,\textstyle\sup_{\theta\in N}\lVert \nabla_\theta g(Y_t,\theta) \rVert\,]<\infty,
  6. matrix G'WG is nonsingular.

Efficiency

So far we have said nothing about the choice of matrix W, except that it must be positive semi-definite. In fact any such matrix will produce a consistent and asymptotically normal GMM estimator, the only difference will be in asymptotic variance of that estimator. It can be shown that taking

 W \propto\ \Omega^{-1}

will result in the most efficient estimator in the class of all asymptotically normal estimators. Efficiency in this case means that such estimator will have the smallest possible variance (we say that matrix A is smaller than matrix B if B–A is positive semi-definite).

In this case formula for asymptotic distribution of GMM estimator simplifies to

\sqrt{T}\big(\hat\theta - \theta_0\big)\ \xrightarrow{d}\ \mathcal{N}\big[0, (G'\,\Omega^{-1}G)^{-1}\big]

The proof that such choice of weighting matrix is indeed optimal is quite elegant, and is often adopted with slight modifications when establishing efficiency of other estimators. As a rule of thumb, weighting matrix is optimal whenever it makes the “sandwich formula” for variance collapse into a simpler expression.

Proof. We will consider difference between asymptotic variance with arbitrary W and asymptotic variance with W = Ω − 1. If we can factor this difference into a symmetric product of the form CC' for some matrix C, then it will guarantee that this difference is nonnegative-definite, and thus W = Ω − 1 will be optimal by definition.
\,V(W)-V(\Omega^{-1}) \,=(G'WG)^{-1}G'W\Omega WG(G'WG)^{-1} - (G'\Omega^{-1}G)^{-1}
\,=(G'WG)^{-1}\Big(G'W\Omega WG - G'WG(G'\Omega^{-1}G)^{-1}G'WG\Big)(G'WG)^{-1}
\,=(G'WG)^{-1}G'W\Omega^{1/2}\Big(I - \Omega^{-1/2}G(G'\Omega^{-1}G)^{-1}G'\Omega^{-1/2}\Big)\Omega^{1/2}WG(G'WG)^{-1}
\,=A(I-B)A',
where we introduced matrices A and B in order to slightly simplify notation; I is an identity matrix. We can notice that matrix B here is symmetric and idempotent: B2 = B. It means I–B is symmetric and idempotent as well: IB = (IB)(IB)'. Thus we can continue to factor the previous expression as
\,=A(I-B)(I-B)'A' = \Big(A(I-B)\Big)\Big(A(I-B)\Big)' \geq 0

Implementation

One difficulty with implementing the outlined method is that we cannot take W = Ω − 1 because looking at definition of matrix Ω, we need to know the value of θ0 in order to compute this matrix, and θ0 is precisely the quantity we don't know and are trying to estimate in the first place.

Several approaches exist to deal with this issue, the first one being the most popular:

  • 2-step feasible GMM:
    • Step 1. Take \hat{W}_T=I (the identity matrix), and compute preliminary GMM estimate \hat\theta_{(1)}. This estimator is consistent for θ0, although probably not efficient.
    • Step 2. Take
      \hat{W}_T = \bigg(\frac{1}{T}\sum_{t=1}^T g(Y_t,\hat\theta_{(1)})g(Y_t,\hat\theta_{(1)})'\bigg)^{-1},
      where we have plugged our first-step preliminary estimate \hat\theta_{(1)}. This matrix converges in probability to Ω − 1 and therefore if we compute \hat\theta with this weighting matrix, such estimator will be asymptotically efficient.
  • Iterative GMM. Essentially the same procedure as 2-step GMM, only matrix  \hat{W}_T is recalculated several times. That is, estimate obtained in step 2 is used to calculate weighting matrix for step 3, and so on. Asymptotically no improvement can be achieved through such iterations, although certain Monte-Carlo experiments suggest that finite-sample properties of this estimator are slightly better.[citation needed]
  • Continuously Updating GMM (CUE). Estimates \hat\theta simultaneously with estimating weighting matrix W:
    \hat\theta = \operatorname{arg}\min_{\theta\in\Theta} \bigg(\frac{1}{T}\sum_{t=1}^T g(Y_t,\theta)\bigg)' \bigg(\frac{1}{T}\sum_{t=1}^T g(Y_t,\theta)g(Y_t,\theta)'\bigg)^{-1} \bigg(\frac{1}{T}\sum_{t=1}^T g(Y_t,\theta)\bigg)

Another important issue is implementation of minimization procedure. The function is supposed to search through (possibly high-dimensional) parameter space Θ and find the value of θ which minimizes the objective function. No generic recommendation for such procedure exists, it is a subject of its own field, numerical optimization.

J-test

When the number of moment conditions is greater than the dimension of parameter vector θ then such situation is called over-identified model. Over-identification is actually a good thing, it allows us to check whether the model is correct or not. Indeed, the GMM method has replaced the problem of solving equation \hat{m}(\theta)=0 by minimization of a certain quadratic form. Such minimization can always be carried over, even when no θ0 such that m0) = 0 exists. Of course we can always check whether \hat{m}(\hat\theta) is sufficiently close to zero, and this is exactly what J-test does (J-test is also known by the name test for over-identifying restrictions).

Formally we consider two hypotheses:

  • H_0:\ m(\theta_0)=0  (model is “valid”), and
  • H_1:\ m(\theta)\neq 0,\ \forall \theta\in\Theta  (model is “invalid”)

Then under null hypothesis H0, J-statistic is asymptotically chi-squared with k–l degrees of freedom

J = T \cdot \bigg(\frac{1}{T}\sum_{t=1}^T g(Y_t,\hat\theta)\bigg)' \hat{W}_T \bigg(\frac{1}{T}\sum_{t=1}^T g(Y_t,\hat\theta)\bigg)\ \xrightarrow{d}\ \chi^2_{k-\ell}   under H0,

where \hat\theta is GMM estimator of parameter θ0, k is the number of moment conditions (dimension of vector g), and l is the number of estimated parameters (dimension of vector θ). Matrix \hat{W}_T must converge in probability to Ω − 1, the efficient weighting matrix (note that previously we only required that W be proportional to Ω − 1 for estimator to be efficient; however in order to conduct J-test W must be exactly equal to Ω − 1, not simply proportional).

Under alternative hypothesis H1, J-statistic is asymptotically unbounded:

J\ \xrightarrow{p}\ \infty  under H1

Thus in order to conduct the test we compute the value of J and compare it with 0.95 quantile of \chi^2_{k-\ell} distribution:

  • H0 is rejected at 95% confidence level if  J > q_{0.95}^{\chi^2_{k-\ell}}
  • H0 cannot be rejected at 95% confidence level if  J < q_{0.95}^{\chi^2_{k-\ell}}

Scope

Many other popular estimation techniques can be cast in terms of GMM optimization:

Implementations

[1]

References

  1. ^ Newey, McFadden (1994), p.2127
  • Lars Peter Hansen (1982): Large Sample Properties of Generalized Method of Moments Estimators, Econometrica 50, 1029-1054.
  • Lars Peter Hansen (2002): Method of Moments in International Encyclopedia of the Social and Behavior Sciences, N. J. Smelser and P. B. Bates (editors), Pergamon: Oxford.
  • Kirby Faciane (2006): Statistics for Empirical and Quantitative Finance. H.C. Baird: Philadelphia. ISBN 0-9788208-9-4.
  • Alastair R. Hall (2005). Generalized Method of Moments (Advanced Texts in Econometrics). Oxford University Press. ISBN 0-19-877520-2.
  • Newey W., McFadden D. (1994). Large sample estimation and hypothesis testing, in Handbook of Econometrics, Ch.36. Elsevier Science.

Search unanswered questions...
Enter a question here...
Search: All sources Community Q&A Reference topics
 
 

 

Copyrights:

Wikipedia. This article is licensed under the Creative Commons Attribution/Share-Alike License. It uses material from the Wikipedia article "Generalized method of moments" Read more