In mathematics, Newton's method is a well-known algorithm for finding roots of equations in one or more dimensions. It can also be used to find local maxima and local minima of functions, as these extrema are the roots of the derivative function.
Contents |
Method
We shall define a series of x-s, starting from an initial guess x0, s.t. the series converges towards x * which satisfies f'(x * ) = 0. This x * will also be an extremum, i.e. stationary point, of f
The second order Taylor expansion of f(x),
,
attains its extremum when Δx solves the linear equation:
Alternatively, one may expand f'(x) to first order in Δx,
giving us the same equation as above when we requiref' = 0.
Thus, provided that
is a twice-differentiable function and the initial guess
is chosen close enough to x * , the sequence (xn) defined by
will converge towards the root of f', i.e. x * for which f'(x * ) = 0.
Geometric interpretation
The geometric interpretation of Newton's method is that at each iteration one approximates
by a quadratic function around
, and then takes a step towards the maximum/minimum of that quadratic function. (If
happens to be a quadratic function, then the exact extremum is found in one step.)
Higher dimensions
The above iterative scheme can be generalized to several dimensions by replacing the derivative with the gradient,
, and the reciprocal of the second derivative with the inverse of the Hessian matrix,
. One obtains the iterative scheme
Usually Newton's method is modified to include a small step size γ > 0 instead of γ = 1
This is often done to ensure that the Wolfe conditions are satisfied at each step
of the iteration.
Newton's method converges much faster towards a local maximum or minimum than gradient descent. In fact, every local minimum has a neighborhood N such that, if we start with
Newton's method with step size γ = 1 converges quadratically (if the Hessian is invertible in that neighborhood).
Finding the inverse of the Hessian is an expensive operation, so the linear equation
.
is often solved approximately (but to great accuracy) using a method such as conjugate gradient. There also exist various quasi-Newton methods, where an approximation for the Hessian is used instead.
If the Hessian is close to a non-invertible matrix, the inverted Hessian can be numerically unstable and the solution may diverge. In this case, certain workarounds have been tried in the past, which have varied success with certain problems. One can, for example, modify the Hessian by adding a correction matrix Bn so as to make
positive definite. One approach is to diagonalize Hf and choose Bn so that
has the same eigenvectors as Hf, but with each negative eigenvalue replaced by ε > 0.
Other approximations
Some functions are poorly approximated by quadratics, particularly when far from a maximum or minimum. In these cases, approximations other than quadratic may be more appropriate [1].
See also
- Quasi-Newton method
- Gradient descent
- Gauss–Newton algorithm
- Levenberg–Marquardt algorithm
- Trust region
- Optimization
References
- ^ Thomas P. Minka (2002-04-17) (PDF). Beyond Newton's Method. http://research.microsoft.com/en-us/um/people/minka/papers/minka-newton.pdf. Retrieved 2009-02-20.
- Avriel, Mordecai (2003). Nonlinear Programming: Analysis and Methods. Dover Publishing. ISBN 0-486-43227-0.
- Nocedal, Jorge & Wright, Stephen J. (1999). Numerical Optimization. Springer-Verlag. ISBN 0-387-98793-2.
External links
This entry is from Wikipedia, the leading user-contributed encyclopedia. It may not have been reviewed by professional editors (see full disclaimer)




![\mathbf{x}_{n+1} = \mathbf{x}_n - [H f(\mathbf{x}_n)]^{-1} \nabla f(\mathbf{x}_n), \ n \ge 0.](http://wpcontent.answers.com/math/a/3/4/a3403b4fe483dcb2667bbf7bbcb221d6.png)
![\mathbf{x}_{n+1} = \mathbf{x}_n - \gamma[H f(\mathbf{x}_n)]^{-1} \nabla f(\mathbf{x}_n).](http://wpcontent.answers.com/math/f/1/6/f16caa97e948bb3d32f521ebadbb7279.png)



