5.1. Finding the Minimum of a Function of One Variable Without Using Derivatives — A Brief Introduction#
References:
Section 13.1 Unconstrained Optimization Without Derivatives of [Sauer, 2022], in particular sub-section 13.1.1 Golden Section Search.
Section 11.1, One-Variable Case in Chapter 11 Optimization of [Chenney and Kincaid, 2012].
5.1.1. Introduction#
The goal of this section is to find the minimum of a function \(f(x)\) and more specifically to find its location: the argument \(p\) such that \(f(p) \leq f(x)\) for all \(x\) in the domain of \(f\).
Several features are similar to what we have seen with zero-finding:
Some restictions on the function \(f\) are needed:
with zero-finding, to guarantee existence of a solution, we needed at least an interval \([a, b]\) on which the function is continuous and with a sign change between the endpoints;
for minimization, the criterion for existence is simply an interval \([a, b]\) on which the function is continuous.
With zero-finding, we needed to compare the values of the function at three points \(a < c < b\) to determine a new, smaller interval containing the root; with minimzation, we instead need to compare the values of the function at four points \(a < c < d < b\) to determine a new, smaller interval containing the minimum.
There are often good reasons to be able to do this without using derivatives.
As is often the case, a guarantee of a unique solution helps to devise a robust algorithm:
to guarantee uniqueness of a zero in interval \([a, b]\), we needed an extra condition like the function being monotonic;
to guarantee uniqueness of a minimum in interval \([a, b]\), the condition we use is being monomodal: The function is decreasing near \(a\), increasing near \(b\), and changes between decreasing and increasing only once (which must therefore happen at the minimum.)
So we assume from now on that the function is monomodal on the interval \([a, b]\).
5.1.2. Step 1: finding a smaller interval within \([a, b]\) that contains the minimum#
As claimed above, three points are not enough: even if for \(a < c < b\) we have \(f(a) > f(c)\) and \(f(c) < f(b)\), the minimum could be either to the left or the right of \(c\).
So instead, choose two internal points \(c\) and \(d\), \(a < c < d < b\).
if \(f(c) < f(d)\), the function is increasing on at least part of the interval \([c, d]\), so the transition from decreasing to increasing is to the left of \(d\): the minimum is in \([a, d]\);
if instead \(f(c) > f(d)\), the “mirror image” argument shows that the minimum is in \([c, b]\).
What about the borderline case when \(f(c) = f(d)\)? The monomodal function cannot be either increasing or decreasing on all of \([c, d]\) so must first decrease and then increase: the minimum is in \([c, d]\), and so is in either of the above intervals.
So we almost have a first algorithm, except for the isue of choosing; given an interval \([a, b]\) on which function \(f\) is monomodal:
Choose two internal points \(c\) and \(d\), with \(a < c < d < b\)
Evaluate \(f(c)\) and \(f(d)\).
If \(f(c) < f(d)\), replace the interval \([a, b]\) by \([a, d]\); else replace it by \([c, b]\).
If the new interval is short enough to locate the minimum with sufficient accuracy (e.g. its length is less that twice the error tolerance) stop; its midpoint is a sufficiently accurate approximate answer); othewise, repaeat from step (1).
5.1.3. Step 2: choosing the internal points so that the method is guaranteed to converge#
There are a couple of details that need to be resolved:
(A) Deciding how to choose the internal points \(c\) and \(d\).
(B) Verifying that the interval does indeed shrink to arbitrarily small length after enough iterations, so that the algorithm succeeds.
Once we have done that and got a working algorithm, there will be the issue of speed:
(C) Amongst the many ways that we could choose the internal points, finding one that (typically at least) is fastest, in the sense of minimizing the number of functions evaluations needed.
For now, I will just describe one “naive” approach that works, but is not optimal for speed; Trisection:
Take \(c\) and \(d\) to divide the interval \([a, b]\) into three equal-width sub-intervals: \(c = (2a +b)/3\), \(d = (a + 2b)/3\), so that each of \([a, c]\), \([c, d]\) and \([d, b]\) are of length \((b-a)/3\).
Then the new interval is \(2/3\) as long as the previous one, and the errors shrink by a factor of \((2/3)^k\) after \(k\) steps, eventually getting as small as one wishes.
5.1.4. Step 3: choosing the internal points so that the method converges as fast as possible#
Coming soon: this leads to the Golden Section Search …
This work is licensed under Creative Commons Attribution-ShareAlike 4.0 International