5.2. Finding the Minimum of a Function of Several Variables#

Last revised on November 20, 2025.

References:

5.2.1. Introduction#

This section will eventually describe several methods for computing the minimum (and its location) of a function \(f(x, y, \dots)\) of several variables:

  • Applying Newton’s method to finding critical points by solving \(\nabla f = \mathbf 0\).

  • Steepest Descent (a.k.a. Gradient Search) where the gradient is used at each iteration to find the direction in which to search for a new approximate location where \(f\) has a lower value.

  • The method of Nelder and Meade, which avoids the use of derivatives.

For now, just the first two are introduced.

5.2.2. Newton’s Method#

A (local) minimum of a function \(\mathbf f(\mathbf x)\) of \(n\) variables occurs at a zero of its gradient: \(\nabla \mathbf f(\mathbf x) = \mathbf{0}\), so one stratgy is to solve the latter equation by Newton’s method.

For a function of one variable, this leads to the iterative form

\[ x^{k+1} = x^{k} - \frac{f'(x^{k})}{f''(x^{k})} \]

To extend to higher dimensions, we need to see what replaces that second derivative.

Firstly, eliminate the division and solve for the increment \(\delta x^{k} := x^{k+1} = x^{k}\):

\[ D^2f(x^{k}) \delta x^{k} = Df(x^{k}) \]

Next, for the system form, each component of the gradient must vanish at a critical point, giving the \(n\) equations

(5.1)#\[ \frac{\partial f}{\partial x_i} (\mathbf x) = 0 \]

The tangent plane approximation of a function\(f:\mathbb{R}^n \to \mathbb{R}\) near a point \(\textbf x^{(k)}\) is

\[ f(\mathbf x) \approx f(\mathbf x^{(k)}) + \sum_{j=1}^n \frac{\partial f}{\partial x_j}(\mathbf x^{(k)}) (x_j - x^{(k)}_j) \]

so applying to each component of the above system (5.1) gives

\[ \frac{\partial f}{\partial x_i}(\mathbf x) \approx \frac{\partial f}{\partial x_i}(\mathbf x^{(k)}) + \sum_{j=1}^n \frac{\partial^2 f}{\partial x_i \partial x_j}(\mathbf x^{(k)}) (x_j - x^{(0)}_j) \]

The term with the second partial derivatives is the \(i\)-th component of a matrix multiplication, with the (symmetric) matrix-valued function being the Hessian of \(f\), with components \( \displaystyle H_{ij}(\mathbf x) = \frac{\partial^2 f}{\partial x_i \partial x_j}(\mathbf x) \)

With this notation, the above collection of approximations is

\[ \nabla f(\mathbf x) \approx \nabla f(\mathbf x^{(k)}) + H(\mathbf x^{(k)}) (\mathbf x - \mathbf x^{(k)}) \]

With this, the linear approximation as used in Newton’s method comes from setting this to zero for \(\mathbf x = \mathbf {x^{(k+1)}}\):

\[ \nabla f(\mathbf x^{(k)}) + H(\mathbf x^{(k)}) (\mathbf x^{(k+1)} - \mathbf x^{(k)}) = \mathbf 0 \]

In other words,

\[ H(\mathbf x^{(k)}) \delta \mathbf x^{(k)} = - \nabla f(\mathbf x^{(k)}),\quad \mathbf x^{(k+1)} = \mathbf x^{(k)} + \delta \mathbf x^{(k)} \]

5.2.4. Exercises#

Exercise 5.6

Implement the method of Steepest Descent (using your choice of one-dimensional minimzation algorithm), and test it initially on an example like

\[f_1(x, y) = 2(x-1)^2 + 3(y-2)^2\]

which has an obvious, unique minimum.

Exercise 5.7

Test the above on some more challenging examples like

\[ f_2(x, y) = K(y-x^2)^2 + (x-1)^2) \]

with minimum at \(x=y=1\).

Start with \(K = 1\); the try with a larger value like \(K = 100\), which makes it harder — a graph might reveal why.

Exercise 5.8

It can also be interesting to explore a function that has multiple local minima, like

\[ f_3(x, y) = (1 - \cos(x-1)) (1 - \cos(y-1)) \]