Nonlinear conjugate gradient method

In numerical optimization, the nonlinear conjugate gradient method generalizes the conjugate gradient method to nonlinear optimization. For a quadratic function the minimum of f is obtained when the gradient is 0: Whereas linear conjugate gradient seeks a solution to the linear equation , the nonlinear conjugate gradient method is generally used to find the local minimum of a nonlinear function using its gradient \nabla_x f alone. It works when the function is approximately quadratic near the minimum, which is the case when the function is twice differentiable at the minimum and the second derivative is non-singular there. Given a function of N variables to minimize, its gradient \nabla_x f indicates the direction of maximum increase. One simply starts in the opposite (steepest descent) direction: with an adjustable step length and performs a line search in this direction until it reaches the minimum of : After this first iteration in the steepest direction, the following steps constitute one iteration of moving along a subsequent conjugate direction , where : With a pure quadratic function the minimum is reached within N iterations (excepting roundoff error), but a non-quadratic function will make slower progress. Subsequent search directions lose conjugacy requiring the search direction to be reset to the steepest descent direction at least every N iterations, or sooner if progress stops. However, resetting every iteration turns the method into steepest descent. The algorithm stops when it finds the minimum, determined when no progress is made after a direction reset (i.e. in the steepest descent direction), or when some tolerance criterion is reached. Within a linear approximation, the parameters and are the same as in the linear conjugate gradient method but have been obtained with line searches. The conjugate gradient method can follow narrow (ill-conditioned) valleys, where the steepest descent method slows down and follows a criss-cross pattern. Four of the best known formulas for are named after their developers: These formulas are equivalent for a quadratic function, but for nonlinear optimization the preferred formula is a matter of heuristics or taste. A popular choice is, which provides a direction reset automatically. Algorithms based on Newton's method potentially converge much faster. There, both step direction and length are computed from the gradient as the solution of a linear system of equations, with the coefficient matrix being the exact Hessian matrix (for Newton's method proper) or an estimate thereof (in the quasi-Newton methods, where the observed change in the gradient during the iterations is used to update the Hessian estimate). For high-dimensional problems, the exact computation of the Hessian is usually prohibitively expensive, and even its storage can be problematic, requiring O(N^2) memory (but see the limited-memory L-BFGS quasi-Newton method). The conjugate gradient method can also be derived using optimal control theory. In this accelerated optimization theory, the conjugate gradient method falls out as a nonlinear optimal feedback controller, for the double integrator system, \ddot x = u The quantities and are variable feedback gains.

This article is derived from Wikipedia and licensed under CC BY-SA 4.0. View the original article.

Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc.
Bliptext is not affiliated with or endorsed by Wikipedia or the Wikimedia Foundation.

Tools

View original History

Contents

Nonlinear conjugate gradient method

Tools