In mathematics and physics, the Metropolis–Hastings algorithm is a Markov chain Monte Carlo method for obtaining a sequence of random samples from a probability distribution for which direct sampling is difficult. This sequence can be used to approximate the distribution (i.e., to generate a histogram), or to compute an integral (such as an expected value).
Contents |
History
The algorithm was named after Nicholas Metropolis, who was an author along with Arianna W. Rosenbluth, Marshall N. Rosenbluth, Augusta H. Teller, and Edward Teller of the 1953 paper Equation of State Calculations by Fast Computing Machines which first proposed the algorithm for the specific case of the Boltzmann distribution;[1] and W. Keith Hastings,[2] who extended it to the more general case in 1970.[3] The Gibbs sampling algorithm is a special case of the Metropolis–Hastings algorithm which is usually faster and easier to use but is less generally applicable.
Overview
The Metropolis–Hastings algorithm can draw samples from any probability distribution
, requiring only that a function dominating (being greater than or equal to) the density can be calculated at
. In Bayesian applications, the normalization factor is often extremely difficult to compute, so the ability to generate a sample without knowing this constant of proportionality is a major virtue of the algorithm. The algorithm generates a Markov chain in which each state
depends only on the previous state
. The algorithm uses a proposal density
, which depends on the current state
, to generate a new proposed sample
. This proposal is "accepted" as the next value (
) if
drawn from
satisfies
If the proposal is not accepted, then the current value of
is retained:
.
For example, the proposal density could be a Gaussian function centred on the current state
:
reading
as the probability density function for
given the previous value
. This proposal density would generate samples centred around the current state with variance
. The original Metropolis algorithm calls for the proposal density to be symmetric
; the generalization by Hastings lifts this restriction. It is also permissible for
not to depend on
at all, in which case the algorithm is called "Independence Chain Metropolis-Hastings" (as opposed to "Random Walk Metropolis-Hastings"). The Independence Chain M-H algorithm with a suitable proposal density function can offer higher accuracy than the random walk version, but it requires some a priori knowledge of the distribution.
Step-by-step instructions
Suppose the most recent value sampled is
. To follow the Metropolis–Hastings algorithm, we next draw a new proposal state
with probability
, and calculate a value
where
is the likelihood ratio between the proposed sample
and the previous sample
, and
is the ratio of the proposal density in two directions (from
to
and vice versa). This is equal to 1 if the proposal density is symmetric. Then the new state
is chosen according to the following rules.
The Markov chain is started from a random initial value
and the algorithm is run for many iterations until this initial state is "forgotten". These samples, which are discarded, are known as burn-in. The remaining set of accepted values of x represent a sample from the distribution P(x).
The algorithm works best if the proposal density matches the shape of the target distribution
, that is
, but in most cases this is unknown. If a Gaussian proposal density
is used the variance parameter
has to be tuned during the burn-in period. This is usually done by calculating the acceptance rate, which is the fraction of proposed samples that is accepted in a window of the last
samples. The desired acceptance rate depends on the target distribution, however it has been shown theoretically that the ideal acceptance rate for a one dimensional Gaussian distribution is approx 50%, decreasing to approx 23% for an
-dimensional Gaussian target distribution.[4] If
is too small the chain will mix slowly (i.e., the acceptance rate will be too high, so the sampling will move around the space slowly and converge slowly to
). If
is too large the acceptance rate will be very low because the proposals are likely to land in regions of much lower probability density so
will be very small.
See also
- Simulated annealing
- Detailed balance
- Multiple-try Metropolis
- Metropolis light transport
- Gibbs sampling
Notes
- ^ Metropolis, N.; Rosenbluth, A.W.; Rosenbluth, M.N.; Teller, A.H.; Teller, E. (1953). "Equations of State Calculations by Fast Computing Machines". Journal of Chemical Physics 21 (6): 1087–1092. doi:.
- ^ Rosenthal, Jeffrey (March 2004). "W.K. Hastings, Statistician and Developer of the Metropolis-Hastings Algorithm". http://probability.ca/hastings/. Retrieved 2009-06-02.
- ^ Hastings, W.K. (1970). "Monte Carlo Sampling Methods Using Markov Chains and Their Applications". Biometrika 57 (1): 97–109. doi:. JSTOR 2334940, Zbl 0219.65008.
- ^ Roberts, G.O.; Gelman, A.; Gilks, W.R. (1997). "Weak convergence and optimal scaling of random walk Metropolis algorithms". Ann. Appl. Probab. 7 (1): 110–120. doi:.
References
- Bernd A. Berg. Markov Chain Monte Carlo Simulations and Their Statistical Analysis. Singapore, World Scientific 2004.
- Siddhartha Chib and Edward Greenberg: "Understanding the Metropolis–Hastings Algorithm". American Statistician, 49(4), 327–335, 1995
External links
This entry is from Wikipedia, the leading user-contributed encyclopedia. It may not have been reviewed by professional editors (see full disclaimer)











