The generalized method of moments is a very general statistical method for obtaining estimates of parameters of statistical models. It is a generalization, developed by Lars Peter Hansen, of the method of moments.
The term GMM is very popular among econometricians but is hardly used at all outside of economics, where the slightly more general term estimating equations is preferred.
Contents |
Introduction
A typical econometric problem can be formulated in the following terms: Suppose available data consists of a large number of i.i.d. observations
, where each observation Yt is an n-dimensional multivariate random variable. Our knowledge (or lack thereof) of economics dictates a certain econometric model for this data. Such model is usually defined only up to some parameter, which we will denote by
. Our main goal is to seek the “true” value of this parameter, θ0, or at least to find a reasonably close estimate.
In order to apply GMM there should exist a (possibly vector-valued) function g(Y,θ) such that
where E denotes expectation, and Yt is just a generic observation, which are all assumed to be i.i.d. Moreover, function m(θ) must not be equal to zero for
, or otherwise parameter θ will not be identified.
The basic idea behind GMM is to replace theoretical expected value E with its empirical analog — sample average:
By the Law of Large Numbers,
for large values of T, so if we can find a number
such that
then such number will be a reasonably good estimate for parameter θ0. So basically all we need to do is to search the parameter space Θ for a number θ which would minimize the distance between
and zero (or in other words the norm of
). For a vector-valued function
the notion of norm can be defined in many different ways, and it actually turns out that the obvious Euclidean norm is not the best choice. Instead a new positive semi-definite “weighting” matrix
is often used which is used to define the norm as a quadratic form
, where ′ denotes matrix transposition. Thus, GMM estimator can be written as
Under suitable conditions this estimator is consistent, asymptotically normal, and with right choice of weighting matrix
asymptotically efficient.
Consistency
Consistency is the most important property of an estimator. It means that having sufficient number of observations, the estimator will get arbitrarily close to the true value of parameter:
Necessary and sufficient conditions for GMM estimator to be consistent are following:
where W is a positive semi-definite matrix,
only for 
which is compact,
is continuous at each θ with probability one,![\operatorname{E}[\,\textstyle\sup_{\theta\in\Theta} \lVert g(Y,\theta)\rVert\,]<\infty.](http://wpcontent.answers.com/math/f/9/2/f9272049db6880ad42d1452d1bb2c7e5.png)
Second condition here (so-called Global identification condition) is often particularly hard to verify. There exist simpler necessary but not sufficient conditions, which may be used to detect non-identification problem:
- Order condition. The dimension of moment function m(θ) should be at least as large as the dimension of parameter vector θ.
- Local identification. If g(Y,θ) is continuously differentiable in a neighborhood of θ0, then matrix
must have full column rank.
In practice applied econometricians often simply assume that global identification holds, without actually proving it.[1]
Asymptotic normality
Asymptotic normality is a useful property, as it allows us to construct confidence bands for the estimator, and conduct different tests. Before we can state asymptotic distribution of GMM estimator, we need to define two auxiliary matrices:
Then under conditions 1–6 listed below, GMM estimator will be asymptotically normal with limiting distribution
Conditions:
is consistent (see previous section),
lies in the interior of set 
is continuously differentiable in some neighborhood N of θ0 with probability one,![\operatorname{E}[\,\lVert g(Y_t,\theta) \rVert^2\,]<\infty,](http://wpcontent.answers.com/math/9/b/c/9bc4d22e675c5eef76177bb497a48878.png)
![\operatorname{E}[\,\textstyle\sup_{\theta\in N}\lVert \nabla_\theta g(Y_t,\theta) \rVert\,]<\infty,](http://wpcontent.answers.com/math/4/d/7/4d7eb9de80880a144b349a0be830009c.png)
- matrix G'WG is nonsingular.
Efficiency
So far we have said nothing about the choice of matrix W, except that it must be positive semi-definite. In fact any such matrix will produce a consistent and asymptotically normal GMM estimator, the only difference will be in asymptotic variance of that estimator. It can be shown that taking
will result in the most efficient estimator in the class of all asymptotically normal estimators. Efficiency in this case means that such estimator will have the smallest possible variance (we say that matrix A is smaller than matrix B if B–A is positive semi-definite).
In this case formula for asymptotic distribution of GMM estimator simplifies to
The proof that such choice of weighting matrix is indeed optimal is quite elegant, and is often adopted with slight modifications when establishing efficiency of other estimators. As a rule of thumb, weighting matrix is optimal whenever it makes the “sandwich formula” for variance collapse into a simpler expression.
| Proof. We will consider difference between asymptotic variance with arbitrary W and asymptotic variance with W = Ω − 1. If we can factor this difference into a symmetric product of the form CC' for some matrix C, then it will guarantee that this difference is nonnegative-definite, and thus W = Ω − 1 will be optimal by definition. | |
![]() |
![]() |
![]() |
|
![]() |
|
![]() |
|
| where we introduced matrices A and B in order to slightly simplify notation; I is an identity matrix. We can notice that matrix B here is symmetric and idempotent: B2 = B. It means I–B is symmetric and idempotent as well: I − B = (I − B)(I − B)'. Thus we can continue to factor the previous expression as | |
![]() |
|
Implementation
One difficulty with implementing the outlined method is that we cannot take W = Ω − 1 because looking at definition of matrix Ω, we need to know the value of θ0 in order to compute this matrix, and θ0 is precisely the quantity we don't know and are trying to estimate in the first place.
Several approaches exist to deal with this issue, the first one being the most popular:
- 2-step feasible GMM:
- Step 1. Take
(the identity matrix), and compute preliminary GMM estimate
. This estimator is consistent for θ0, although probably not efficient. - Step 2. Take
. This matrix converges in probability to Ω − 1 and therefore if we compute
with this weighting matrix, such estimator will be asymptotically efficient.
- Step 1. Take
- Iterative GMM. Essentially the same procedure as 2-step GMM, only matrix
is recalculated several times. That is, estimate obtained in step 2 is used to calculate weighting matrix for step 3, and so on. Asymptotically no improvement can be achieved through such iterations, although certain Monte-Carlo experiments suggest that finite-sample properties of this estimator are slightly better.[citation needed] - Continuously Updating GMM (CUE). Estimates
simultaneously with estimating weighting matrix W:
Another important issue is implementation of minimization procedure. The function is supposed to search through (possibly high-dimensional) parameter space Θ and find the value of θ which minimizes the objective function. No generic recommendation for such procedure exists, it is a subject of its own field, numerical optimization.
J-test
When the number of moment conditions is greater than the dimension of parameter vector θ then such situation is called over-identified model. Over-identification is actually a good thing, it allows us to check whether the model is correct or not. Indeed, the GMM method has replaced the problem of solving equation
by minimization of a certain quadratic form. Such minimization can always be carried over, even when no θ0 such that m(θ0) = 0 exists. Of course we can always check whether
is sufficiently close to zero, and this is exactly what J-test does (J-test is also known by the name test for over-identifying restrictions).
Formally we consider two hypotheses:
(model is “valid”), and
(model is “invalid”)
Then under null hypothesis H0, J-statistic is asymptotically chi-squared with k–l degrees of freedom
under H0,
where
is GMM estimator of parameter θ0, k is the number of moment conditions (dimension of vector g), and l is the number of estimated parameters (dimension of vector θ). Matrix
must converge in probability to Ω − 1, the efficient weighting matrix (note that previously we only required that W be proportional to Ω − 1 for estimator to be efficient; however in order to conduct J-test W must be exactly equal to Ω − 1, not simply proportional).
Under alternative hypothesis H1, J-statistic is asymptotically unbounded:
under H1
Thus in order to conduct the test we compute the value of J and compare it with 0.95 quantile of
distribution:
- H0 is rejected at 95% confidence level if

- H0 cannot be rejected at 95% confidence level if

Scope
Many other popular estimation techniques can be cast in terms of GMM optimization:
- Ordinary Least Squares (OLS) is equivalent to GMM with moment conditions
- Generalized Least Squares (GLS)
- Instrumental variables regression (IV)
- Non-linear Least Squares (NLLS):
- Maximum likelihood estimation (MLE):
Implementations
References
- ^ Newey, McFadden (1994), p.2127
- Lars Peter Hansen (1982): Large Sample Properties of Generalized Method of Moments Estimators, Econometrica 50, 1029-1054.
- Lars Peter Hansen (2002): Method of Moments in International Encyclopedia of the Social and Behavior Sciences, N. J. Smelser and P. B. Bates (editors), Pergamon: Oxford.
- Kirby Faciane (2006): Statistics for Empirical and Quantitative Finance. H.C. Baird: Philadelphia. ISBN 0-9788208-9-4.
- Alastair R. Hall (2005). Generalized Method of Moments (Advanced Texts in Econometrics). Oxford University Press. ISBN 0-19-877520-2.
- Newey W., McFadden D. (1994). Large sample estimation and hypothesis testing, in Handbook of Econometrics, Ch.36. Elsevier Science.
This entry is from Wikipedia, the leading user-contributed encyclopedia. It may not have been reviewed by professional editors (see full disclaimer)

![m(\theta_0) \equiv \operatorname{E}[\,g(Y_t,\theta_0)\,]=0,](http://wpcontent.answers.com/math/c/e/5/ce58c3e09e783588063f10f280e84f38.png)
![\hat{m}(\theta) = \hat\operatorname{E}\big[\,g(Y_t,\theta)\,\big] \equiv \frac{1}{T}\sum_{t=1}^T g(Y_t,\theta)](http://wpcontent.answers.com/math/d/3/7/d37ce8e88fa170cc3ac01ab619d42b4d.png)


![G = \operatorname{E}[\,\nabla_{\!\theta}\,g(Y_t,\theta_0)\,], \qquad
\Omega = \operatorname{E}[\,g(Y_t,\theta_0)g(Y_t,\theta_0)'\,]](http://wpcontent.answers.com/math/b/c/4/bc4240b55a8671a8fc6216ff0337ccc6.png)
![\sqrt{T}\big(\hat\theta - \theta_0\big)\ \xrightarrow{d}\ \mathcal{N}\big[0, (G'WG)^{-1}G'W\Omega WG(G'WG)^{-1}\big]](http://wpcontent.answers.com/math/e/3/6/e368fed10c6327e7bd823503beea783d.png)

![\sqrt{T}\big(\hat\theta - \theta_0\big)\ \xrightarrow{d}\ \mathcal{N}\big[0, (G'\,\Omega^{-1}G)^{-1}\big]](http://wpcontent.answers.com/math/b/d/0/bd0b4b286c89523c39fefea6881c348a.png)








![\operatorname{E}[\,x_t(y_t - x_t'\beta)\,]=0](http://wpcontent.answers.com/math/4/8/b/48b700372de369b1c08b55d9529ec80b.png)
![\operatorname{E}[\,x_t(y_t - x_t'\beta)/\sigma^2(x_t)\,]=0](http://wpcontent.answers.com/math/a/0/0/a00c49589b5dd32cee5e82e2db3356af.png)
![\operatorname{E}[\,z_t(y_t - x_t'\beta)\,]=0](http://wpcontent.answers.com/math/c/e/1/ce1ccc90be2000220c5022cefb0cdc12.png)
![\operatorname{E}[\,\nabla_{\!\beta}\, g(x_t,\beta)\cdot(y_t - g(x_t,\beta))\,]=0](http://wpcontent.answers.com/math/e/f/e/efe068df2872314ff2bc0be5f028faf7.png)
![\operatorname{E}[\,\nabla_{\!\theta} \ln f(x_t,\theta) \,]=0](http://wpcontent.answers.com/math/a/f/f/affc591597be7752551d92fa14888d30.png)



