5.1. Finding the Minimum of a Function of One Variable Without Using Derivatives#

Last revised on November 24, 2025, adding further details on the method of Successive Parabolic Interpolation.

References:

5.1.1. Introduction#

The goal of this section is to find the minimum of a function \(f(x)\) and more specifically to find its location: the argument \(p\) such that \(f(p) \leq f(x)\) for all \(x\) in the domain of \(f\).

Several features are similar to what we have seen with zero-finding:

  • Some restictions on the function \(f\) are needed:

    • with zero-finding, to guarantee existence of a solution, we needed at least an interval \([a, b]\) on which the function is continuous and with a sign change between the endpoints;

    • for minimization, the criterion for existence is simply an interval \([a, b]\) on which the function is continuous.

  • With zero-finding, we needed to compare the values of the function at three points \(a < c < b\) to determine a new, smaller interval containing the root; with minimzation, we instead need a triple of “left”, “center” and “right” points \(l < c < r\) with the “V configuration” \(f(l) > f(c)\) and \(f(c) < f(r)\).

  • Then we seek a new approximation \(x\) and compare the values of the function at these four points to determine a new, smaller “V” interval containing the minimum.

  • There are often good reasons to be able to do this without using derivatives.

As is often the case, a guarantee of a unique solution helps to devise a robust algorithm:

  • to guarantee uniqueness of a zero in interval \([a, b]\), we needed an extra condition like the function being monotonic;

  • to guarantee uniqueness of a minimum in interval \([a, b]\), the condition we use is being monomodal: The function is decreasing near \(a\), increasing near \(b\), and changes between decreasing and increasing only once (which must therefore happen at the minimum.)

So we assume from now on that we have found a starting interval \(I = [a, b]\) on which the function is the function is monomodal. This is not always easy to verify, but one condition that guarantees it is the function being concave up, which is ensured by having \(f''(x) \geq 0\) on interval \(I\).

Without a guarantee of monomodality, the methods described here still find a local minimum, but not necessarily the global minimum on the interval \(I\).

5.1.2. Step 1: finding a smaller interval within \([a, b]\) that contains the minimum#

As claimed above, three points are not enough to close in on the minimum: even if for \(a < c < b\) we have \(f(a) > f(c)\) and \(f(c) < f(b)\), the minimum could be either to the left or the right of \(c\).

So instead, choose two internal points \(c\) and \(d\), \(a < c < d < b\).

  • if \(f(c) < f(d)\), the function is increasing on at least part of the interval \([c, d]\), so the transition from decreasing to increasing is to the left of \(d\): the minimum is in \([a, d]\);

  • if instead \(f(c) > f(d)\), the “mirror image” argument shows that the minimum is in \([c, b]\). So in either cae, we can shrink the interval.

What about the borderline case when \(f(c) = f(d)\)? The monomodal function cannot be either increasing or decreasing on all of \([c, d]\) so must first decrease and then increase: the minimum is in \([c, d]\), and so is in either of the above intervals.

So we almost have a first algorithm, except for the issue of choosing; given an interval \([a, b]\) on which function \(f\) is monomodal:

  1. Choose two internal points \(c\) and \(d\), with \(a < c < d < b\)

  2. Evaluate \(f(c)\) and \(f(d)\).

  3. If \(f(c) \leq f(d)\), replace the interval \([a, b]\) by \([a, d]\); else replace it by \([c, b]\).

  4. Then choose two new points within this interval; maybe reusing one of the preious points (reusing \(c\) if the new interval is \([a, d]\) and \(d\) if the new interval is \([c, b]\)), so that only one new function evaluation is needed.

  5. If the new interval is short enough to locate the minimum with sufficient accuracy (e.g. its length is less that twice the error tolerance) stop; its midpoint is a sufficiently accurate approximate answer); othewise, repeat from step (1).

5.1.3. Step 2: choosing the internal points so that the method is guaranteed to converge#

There are a couple of details that need to be resolved:

(A) Deciding how to choose the internal points \(c\) and \(d\).

(B) Verifying that the interval does indeed shrink to arbitrarily small length after enough iterations, so that the algorithm succeeds.

Once we have done that and got a working algorithm, there will be the issue of speed:

(C) Among the many ways that we could choose the internal points, finding one that (typically at least) is fastest, in the sense of minimizing the number of functions evaluations needed.

For now, I will just describe one “naive” approach that works, but is not optimal for speed:

Initially, take \(c\) and \(d\) to divide the interval \([a, b]\) into three equal-width sub-intervals: \(c = (2a +b)/3\), \(d = (a + 2b)/3\), so that each of \([a, c]\), \([c, d]\) and \([d, b]\) are of length \((b-a)/3\). Then the new interval is \(2/3\) as long as the previous one.

If we continue by trisecting the new interval, the errors shrink by a factor of \((2/3)^k\) after \(k\) steps, eventually getting as small as one wishes.

However, that does not “recycle” either of the previous internal points. One could instead recycle and then chooise just one new intervla point, perhaps the midpoint of the longest of th two new intervals. This then raises the new challenge of determining whether the intervals shrink to length zero, and how many iterato are needed to meet an error tolerance.

5.1.5. Convergence#

The error bound on the midpoint \(m_k = (a_k + b_k)/2\) decreases by a factor of \(g\) at each step, and before starting that bound is \((b_0 - a_0)/2\); thus after \(k\) steps the eror bound is \(g^k (b_0 - a_0)/2\).

Since \(g < 2/3\), this is better than for the above trisection method; combined with the halving of the number of function evaluations in each step, this shows that the Golden Section Search is clearly superior.

5.1.6. Successive Parabolic Interpolation: an overview#

Just as the Secant Method (usually) improves on the speed of the bisection method by using the values of \(f\) via linear interpolation rather than just inequality comparisons, so polynomial interpolation from a triple of points can (usually) improve on the speed of the Golden Section Search.

The basic idea it to again work with a triple of points \((l_k, c_k, r_k)\) at each iteration with \(f(l_k) > f(c_k)\) and \(f(c_k) < f(r_k)\), and computing a new approximation \(x_k\) location of the minimum as the minimum of the quadratic passing through these three points. Then

  • If the new approximation \(x_k < c_k\), the new triple is \((l_k, x_k, c_k)\);

  • otherwise, it is \((c_k, x_k, r_k)\).

This is typically faster than the Golden Section Search, but without such a guarantee of its convergence or speed.

One question is, once a new point is found, which of the three previous ones to discard.

One answer is to mimic the secant method: discard the oldest point.

For that, call the three points at each stage \(a\), \(b\), and \(c\), with the idea that \(c\) is the newest; maybe the midpoint if the initial interval \([a, b]\).

Then the parabolic interpolation give a new point

\[ x = \frac{b+c}{2} - \frac{(f(b)-f(c))(a-c)(a-b)}{2[(b-c)(f(a)-f(b)) - (f(b)-f(c))(a-b)]} \]

And the update is that the new \((a, b, c)\) is the previous \((b, c, x)\).

An alternative strategy would be to examine the values at the three previous points and discard the one that is “worst” in the sense of having the largest value. However, since each new point typically has a smaller value for \(f\) than all previous ones (true at least once is close enough to the minimum that the parabolic approximation is reasonably accurate), typically the valuea will eventually be in order of decreasing \(f\) value, and then the oldest will typically be the “worst”, and so the best one to discard either way.

Further details of an algorithm for this, and its implementation and experimental exploration, are left as exercises.

5.1.7. Exercises#

Exercise 5.1

Explain why the formulas for the new interior points in the Procedure for the Golden Section Search are correct.

Exercise 5.2

Describe the golden section method precisely in pseudocode.

Exercise 5.3

Implement the golden section method in Python and test it on a few examples, such as

  1. \(f(x) = -x e^{-x^2}\) (whose minimum can be shown to occur at \(x=1\))

  2. \(f(x) = x e^{-x^2}.\)

The second requires preparatory work finding a suitable initial triplet.

Note that with a nice smooth “target function” \(f\) as here, one can get numerical confirmation of the result by evaluating \(f'(x)\). To be careful, that only checks that it is a critical point; the second derivative test can check whether it is a local minimum.

Exercise 5.4

Describe the method of Successive Parabolic Interpolation precisely in pseudocode.

Exercise 5.5

Implement the method of Successive Parabolic Interpolation in Python and test it on a few examples, such as

  1. \(f(x) = -x e^{-x^2}\) (whose minimum can be shown to occur at \(x=1\))

  2. \(f(x) = x e^{-x^2}\)

(See the notes on Exercise)