1.2. Solving Equations by Fixed Point Iteration (of Contraction Mappings)#

Last revised on August 27, 2025

References:

1.2.1. Introduction#

In the next section we will meet Newton’s method for root-finding, which you might have seen in a calculus course. This is one very important example of the more general strategy of fixed-point iteration, and we start with that.

import numpy as np
import matplotlib.pyplot as plt
# We can also import items from a module individually, so they can be used by "first name only".
# Here this will mostly be done mostly for mathematical functions with familiar names:
from numpy import cos
# ... and for some very frequently used functions for graphing:
from matplotlib.pyplot import figure, plot, title, legend, grid

1.2.2. Fixed-point form for equations#

A variant of stating equations as root-finding (\(f(x) = 0\)) is fixed-point form: given a function \(g:\mathbb{R} \to \mathbb{R}\) or \(g:\mathbb{C} \to \mathbb{C}\) (or even \(g:\mathbb{R}^n \to \mathbb{R}^n\); a later topic), find a fixed point of \(g\). That is, a value \(p\) for its argument such that

\[g(p) = p\]

Such problems are interchangeable with root-finding. One way to convert from \(f(x) = 0\) to \(g(x) = x\) is defining

\[g(x) := x - w(x) f(x)\]

for any “weight function” \(w(x)\).

One can convert the other way too, for example defining \(f(x) := g(x) - x\). We have already seen this when we converted the equation \(x = \cos x\) to \(f(x) = x - \cos x = 0\).

Compare the two setups graphically: in each case, the \(x\) value at the intersection of the two curves is the solution we seek.

def f(x): return x - cos(x)
def g(x): return cos(x)
a = -1
b = 1

x = np.linspace(a, b)

figure(figsize=(10,6))
title("Zeros of $y = f(x) = x - cos(x)$, with $y=0$")
plot(x, f(x))
plot([a, b], [0, 0])
grid(True)

figure(figsize=(7,7))
title("Fixed points of $y = g(x) = cos(x)$ with $y=x$")
plot(x, g(x))
plot(x, x)
grid(True)
../_images/e306bef02deaba80aad11f0c793a4da6781a22cae3cecc207aedbf06b8948af7.png ../_images/d65f46080ecdc8e7e32bf73776c71afab68ff76ad076cea0669e7650e8dbad3c.png

1.2.3. Fixed-point iteration#

Fixed point form can be convenient partly because we almost always have to solve by successive approximations, or iteration, and fixed point form suggests one choice of iterative procedure: start with any first approximation \(x_0\), and iterate with

\[ x_1 = g(x_0), \, x_2 = g(x_1), \dots, x_{k+1} = g(x_k), \dots \]

Proposition 1.1

If \(g\) is continuous, and if the above sequence \(\{x_0, x_1, \dots \}\) converges to a limit \(p\), then that limit is a fixed point of function \(g\): \(g(p) = p\).

Proof. From \(\displaystyle \lim_{k \to \infty} x_k = p\), continuity gives

\[\lim_{k \to \infty} g(x_k) = g(p).\]

On the other hand, \(g(x_k) = x_{k+1}\), so

\[\lim_{k \to \infty} g(x_k) = \lim_{k \to \infty} x_{k+1} = p.\]

Comparing gives \(g(p) = p\).

1.2.4. Mappings and contraction mappings#

That second “if” is a big one; Fortunately, it can often be resolved using the idea of a contraction mapping.

First, we need the idea of a mapping.

Definition 1.1 (Mapping)

A function \(g(x)\) defined on a closed interval \(D = [a, b]\) which sends values back into that interval, \(g: D \to D\), is sometimes called a map or mapping.

Remark 1.6

The same applies for a function \(g: D \to D\) where \(D\) is a subset of the complex numbers, or even of the vectors \(\mathbb{R}^n\) or \(\mathbb{C}^n\). This will help later with solving more general types of equations.

A mapping is sometimes thought of as moving a region \(S\) within its domain \(D\) to another such region, by moving each point \(x \in S \subset D\) to its image \(g(x) \in g(S) \subset D\).

Proposition 1.2

Any continuous mapping on a closed interval \([a, b]\) has at least one fixed point.

Proof. Consider the “root-finding cousin”, \(f(x) = x - g(x)\).

First, \(f(a) = a - g(a) \leq 0\), since \(g(a) \geq a\) so as to be in the domain \([a,b]\) — similarly, \(f(b) = b - g(b) \geq 0\).

From the Intermediate Value Theorem, \(f\) has a zero \(p\), where \(f(p) = p - g(p) = 0\).

In other words, the graph of \(y=g(x)\) goes from being above the line \(y=x\) at \(x=a\) to below it at \(x=b\), so at some point \(x=p\), the curves meet: \(y = x = p\) and \(y = g(p)\), so \(p = g(p)\).

Example 1.2

Let us illustrate this with the mapping \(g_4(x) := 4 \cos x\), for which the fact that \(|g_4(x)| \leq 4\) ensures that this is a map of the domain \(D = [-4, 4]\) into itself:

def g_4(x): return 4*cos(x)
a = -4
b = 4
x = np.linspace(a, b)

figure(figsize=(7,7))
title("Fixed points of the map $g_4(x) = 4 cos(x)$")
plot(x, g_4(x), label="$y=g_4(x)$")
plot(x, x, label="$y=x$")
legend()
grid(True);
../_images/81ac38a371daef884d905aec0225a36b70d95f926cae488b26480296a69b6a88.png

The above example has multiple fixed points (three of them). To ensure the existence of a unique solution, and convergence of the iteration to that solution, we use an extra condition.

Definition 1.2 (Contraction Mapping)

A mapping \(g:D \to D\), is called a contraction or contraction mapping if there is a constant \(C < 1\) such that

\[|g(x) - g(y)| \leq C |x - y|\]

for any \(x\) and \(y\) in \(D\). We then call \(C\) a contraction constant.

Remark 1.7

  • It is not enough to have \(| g(x) - g(y) | < | x - y |\) or \(C = 1\)! We need the ratio \(\displaystyle \frac{|g(x) - g(y)|}{|x - y|}\) to be uniformly less than one for all possible values of \(x\) and \(y\).

  • The same concept applies for a domain in \(\mathbb{C}\), \(\mathbb{R}^n\) or \(\mathbb{C}^n\): for vectors, just replace the absolute value \(| \dots |\) by the vector norm \(\| \dots \|\). In fact, we will sometimes blur the distinction by using the “single line” absolute value notation for vector norms too.

Theorem 1.2 (A Contraction Mapping Theorem)

Any contraction mapping on a closed, bounded interval \(D = [a, b]\) has exactly one fixed point \(p\) in \(D\). Further, this can be calculated as the limit \(\displaystyle p = \lim_{k \to \infty} x_k\) of the iteration sequence given by \(x_{k+1} = g(x_{k})\) for any choice of the starting point \(x_{0} \in D\).

Proof. First, uniqeness: (the main idea of the proof can be shown with the help of a few pictures.)

between any two of the multiple fixed points above — call them \(p_0\) and \(p_1\) — the graph of \(g(x)\) has to rise with secant slope 1: \((g(p_1) - g(p_0)/(p_1 - p_0) = (p_1 - p_0)/(p_1 - p_0) = 1\), and this violates the contraction property.

So instead, for a contraction, the graph of a contraction map looks like the one below for our favorite example, \(g(x) = \cos x\) (which we will soon verify to be a contraction on interval \([-1, 1]\)).

The second claim, about convergence to the fixed point from any initial approximation \(x_0\), will be verified below, once we have seen some ideas about measuring errors.

1.2.5. An easy way of checking whether a differentiable function is a contraction#

With differentiable functions, the contraction condition can often be easily verified using derivatives:

Theorem 1.3 (A derivative-based fixed point theorem)

If a function \(g:[a,b] \to [a,b]\) is differentiable and there is a constant \(C < 1\) such that \(|g'(x)| \leq C\) for all \(x \in [a, b]\), then \(g\) is a contraction mapping with this contraction constant, and so has a unique fixed point in this interval.

Proof. Using the Mean Value Theorem, \(g(x) - g(y) = g'(c)(x - y)\) for some \(c\) between \(x\) and \(y\). Then taking absolute values,

\[|g(x) - g(y)| = |g'(c)| \cdot |(x - y)| \leq C |(x - y)|.\]

Remark 1.8

There is a partial converse: if \(|g'(p)| > 1\) at a fixed point \(p\), the contraction mapping will not converge to that fixed point, and in fact will diverge from it, as illustrated in Example 1.5 below.

Example 1.3 (\(g(x) = \cos(x)\) is a contraction on interval \([-1,1]\))

Our favorite example \(g(x) = \cos(x)\) is a contraction, but we have to be a bit careful about the domain.

For all real \(x\), \(g'(x) = -\sin x\), so \(|g'(x)| \leq 1\); this is almost but not quite enough.

However, we have seen that iteration values will settle in the interval \(D = [-1,1]\), and considering \(g\) as a mapping of this domain, \(|g'(x)| \leq \sin(1) = 0.841\dots < 1\): that is, now we have a contraction, with \(C = \sin(1) \approx 0.841\).

And as seen in the graph above, there is indeed a unique fixed point.

1.2.6. The contraction constant \(C\) as a measure of how fast the approximations improve (the smaller the better)#

It can be shown that if \(C\) is small (at least when one looks only at a reduced domain \(|x - p| < R\)) then the convergence is “fast” once \(|x_k - p| < R\).

To see this, we define some jargon for talking about errors. (For more details on error concepts, see the section on error measures and convergence rates.)

Definition 1.3 (Error)

The error in \(\tilde x\) as an approximation to an exact value \(x\) is

\[\text{error} := \text{(approximation)} - \text{(exact value)} = \tilde x - x\]

This will often be abbreviated as \(E\).

Definition 1.4 (Absolute Error)

The absolute error in \(\tilde x\) an approximation to an exact value \(x\) is the magnitude of the error: the absolute value \(|E| = |\tilde x - x|\).

Remark 1.9

This will later be extended to \(x\) and \(\tilde x\) being vectors, by again using the vector norm in place of the absolute value.

In the case of \(x_k\) as an approximation of \(p\), we name the error \(E_k := x_k - p\). Then \(C\) measures a worst case for how fast the error decreases as \(k\) increases, and this is “exponentially fast”:

Proposition 1.3

\(|E_{k+1}| \leq C |E_{k}|\), or \(|E_{k+1}|/|E_{k}|\leq C\), and so

\[|E_k| \leq C^k |x_0 - p|\]

That is, the error decreases at worst in a geometric sequence, which is exponential decrease with respect to the variable \(k\).

Proof. \(E_{k+1} = x_{k+1} - p = g(x_{k}) - g(p)\), using \(g(p) = p\). Thus the contraction property gives

\[|E_{k+1}| = |g(x_k) - g(p)| \leq C |x_k - p| = C |E_k|\]

Applying this again,

\[|E_k| \leq C |E_{k-1}| \leq C \cdot C |E_{k-2}| = C^2 |E_{k-2}|\]

and repeating \(k-2\) more times,

\[|E_k|\leq C^k |E_0| = C^k |x_0 - p|.\]

Remark 1.10

We will often use this “recursive” strategy of relating the error in one iterate to that in the previous iterate.

We can now complete the proof of the above contraction mapping theorem Theorem 1.2:

Proof. This now follows from Proposition 1.3:

For any initial approximation \(x_0\), we know that \(|E_k|\leq C^k |x_0 - p|\), and with \(C < 1\), this can be made as small as we want by choosing a large enough value of \(k\). Thus

\[\lim_{k \to \infty} |E_k| = \lim_{k \to \infty} |x_k - p| = 0,\]

which is another way of saying that \(\displaystyle \lim_{k \to \infty} x_k = p\), or \(x_k \to p\), as claimed.

Example 1.4 (Solving \(x = \cos x\) with a naive fixed point iteration)

We have seen that one way to convert the example \(f(x) = x - \cos x = 0\) to a fixed point iteration is \(g(x) = \cos x\), and that this is a contraction on \(D = [-1, 1]\)

Here is what this iteration looks like:

a = 0
b = 1
x = np.linspace(a, b)
iterations = 10

# Start at left
print(f"Solving x = cos(x) starting to the left, at x_0 = {a}")
x_k = a
figure(figsize=(7,7))
title(f"Solving $x = cos x$ starting to the left, at $x_0$ = {a}")
plot(x, x, "g")
plot(x, g(x), "r")
grid(True)
for k in range(iterations):
    g_x_k = g(x_k)
    # Graph evalation of g(x_k) from x_k:
    plot([x_k, x_k], [x_k, g(x_k)], "b")
    x_k_plus_1 = g(x_k)
    #Connect to the new x_k on the line y = x:
    plot([x_k, g(x_k)], [x_k_plus_1, x_k_plus_1], "b")
    # Update names: the old x_k+1 is the new x_k
    x_k = x_k_plus_1
    print(f"x_{k+1} = {x_k_plus_1}")

# Start at right
print(f"Solving x = cos(x) starting to the right, at x_0 = {b}")
x_k = b
figure(figsize=(7,7))
title(f"Solving $x = cos(x)$ starting to the right, at $x_0$ = {b}")
plot(x, x, "g")
plot(x, g(x), "r")
grid(True)
for k in range(iterations):
    g_x_k = g(x_k)
    # Graph evalation of g(x_k) from x_k:
    plot([x_k, x_k], [x_k, g(x_k)], "b")
    x_k_plus_1 = g(x_k)
    #Connect to the new x_k on the line y = x:
    plot([x_k, g(x_k)], [x_k_plus_1, x_k_plus_1], "b")
    # Update names: the old x_k+1 is the new x_k
    x_k = x_k_plus_1
    print(f"x_{k+1} = {x_k_plus_1}")
Solving x = cos(x) starting to the left, at x_0 = 0
x_1 = 1.0
x_2 = 0.5403023058681398
x_3 = 0.8575532158463933
x_4 = 0.6542897904977792
x_5 = 0.7934803587425655
x_6 = 0.7013687736227566
x_7 = 0.7639596829006542
x_8 = 0.7221024250267077
x_9 = 0.7504177617637605
x_10 = 0.7314040424225098
Solving x = cos(x) starting to the right, at x_0 = 1
x_1 = 0.5403023058681398
x_2 = 0.8575532158463933
x_3 = 0.6542897904977792
x_4 = 0.7934803587425655
x_5 = 0.7013687736227566
x_6 = 0.7639596829006542
x_7 = 0.7221024250267077
x_8 = 0.7504177617637605
x_9 = 0.7314040424225098
x_10 = 0.7442373549005569
../_images/217cea6bb82b18e25e68aa3e2c8f64f96f2e82907b32e9ceef038be4516ba98c.png ../_images/ac342477af15528135808bfbe11e349714fb855261962c467cf1d62fa4b6f65b.png

In each case, one gets a “box spiral” in to the fixed point. It always looks like this when \(g\) is decreasing near the fixed point.

If instead \(g\) is increasing near the fixed point, the iterates approach monotonically, either from above or below:

Example 1.5 (Solving \(f(x) = x^2 - 5x + 4 = 0\) in interval \([0, 3]\))

The roots are 1 and 4; for now we aim at the first of these, so we chose a domain \([0, 3]\) that contains just this root.

Let us get a fixed point form by “partially solving for \(x\)”: solving for the \(x\) in the \(5 x\) term:

\[ x = g(x) = (x^2 + 4)/5 \]
def f_2(x): return x**2 - 5*x + 4
def g_2(x): return (x**2 + 4)/5
a = 0
b = 3
x = np.linspace(a, b)

figure(figsize=(10,5))
title("$y = f_2(x) = x^2-5x+4$ and $y = 0$")
plot(x, f_2(x))
plot([a, b], [0, 0])
grid(True)

figure(figsize=(10,5))
title("$y = g_2(x) = (x^2 + 4)/5$ and $y=x$")
plot(x, g_2(x))
plot(x, x)
grid(True)
../_images/9d269a6976ffaace792fce7adf52a569dee4a9e21e005e4e90f42e6834e030c3.png ../_images/964943e53d18edf6663d921e97cfd7f7b0bf11699bb71875e5ba2161c9486d55.png
iterations = 10
# Start at left
a = 0.0
b = 2.0
x = np.linspace(a, b)
x_k = a
figure(figsize=(7,7))
title(f"Starting to the left, at x_0 = {a}")
grid(True)
plot(x, x, "g")
plot(x, g_2(x), "r")
for k in range(iterations):
    g_x_k = g_2(x_k)
    # Graph evalation of g(x_k) from x_k:
    plot([x_k, x_k], [x_k, g_2(x_k)], "b")
    x_k_plus_1 = g_2(x_k)
    #Connect to the new x_k on the line y = x:
    plot([x_k, g_2(x_k)], [x_k_plus_1, x_k_plus_1], "b")
    # Update names: the old x_k+1 is the new x_k
    x_k = x_k_plus_1
    print(f"x_{k+1} = {x_k_plus_1}")
x_1 = 0.8
x_2 = 0.9280000000000002
x_3 = 0.9722368000000001
x_4 = 0.9890488790548482
x_5 = 0.9956435370319303
x_6 = 0.9982612105666906
x_7 = 0.9993050889044148
x_8 = 0.999722132142052
x_9 = 0.9998888682989302
x_10 = 0.999955549789623
../_images/ae45fac9c4288047a28946d3c4a89d628029b4a0a01e6e307a5f258e99d53383.png
# Start at right
a = 0.0
b = 2.0
x = np.linspace(a, b)
x_k = b
figure(figsize=(7,7))
title(f"Starting to the right, at x_0 = {b}")
grid(True)
plot(x, x, "g")
plot(x, g_2(x), "r")
for k in range(iterations):
    g_x_k = g_2(x_k)
    # Graph evalation of g(x_k) from x_k:
    plot([x_k, x_k], [x_k, g_2(x_k)], "b")
    x_k_plus_1 = g_2(x_k)
    #Connect to the new x_k on the line y = x:
    plot([x_k, g_2(x_k)], [x_k_plus_1, x_k_plus_1], "b")
    # Update names: the old x_k+1 is the new x_k
    x_k = x_k_plus_1
    print(f"x_{k+1} = {x_k_plus_1}")
x_1 = 1.6
x_2 = 1.312
x_3 = 1.1442688
x_4 = 1.061870217330688
x_5 = 1.0255136716907844
x_6 = 1.010335658164943
x_7 = 1.0041556284319177
x_8 = 1.0016657052223
x_9 = 1.0006668370036975
x_10 = 1.000266823735797
../_images/532bb2124014dacbf1259fd14f3f67ccd4a9c5b4e87214b74e827005484f5759.png

However, things go awry if we try to compute the other fixed point, \(p=4\):

# Start near the other fixed point, p=4
a = 3.5
b = 5.0
x = np.linspace(a, b)
x_k = 4.001
iterations = 15
figure(figsize=(7,7))
title(f"Starting at x_0 = {x_k}")
grid(True)
plot(x, x, "g")
plot(x, g_2(x), "r")
for k in range(iterations):
    g_x_k = g_2(x_k)
    # Graph evalation of g(x_k) from x_k:
    plot([x_k, x_k], [x_k, g_2(x_k)], "b")
    x_k_plus_1 = g_2(x_k)
    #Connect to the new x_k on the line y = x:
    plot([x_k, g_2(x_k)], [x_k_plus_1, x_k_plus_1], "b")
    # Update names: the old x_k+1 is the new x_k
    x_k = x_k_plus_1
    print(f"x_{k+1} = {x_k_plus_1}")
x_1 = 4.0016002
x_2 = 4.002560832128009
x_3 = 4.004098642977052
x_4 = 4.006561188538134
x_5 = 4.0105065115000205
x_6 = 4.0168324957568124
x_7 = 4.026988659793581
x_8 = 4.043327533221221
x_9 = 4.06969950818096
x_10 = 4.112490817377671
x_11 = 4.182516144603133
x_12 = 4.29868825997317
x_13 = 4.495744151286233
x_14 = 4.842343094764873
x_15 = 5.489657329483409
../_images/9b11db3ea7015764988c719847a2b734dd44c5ef27227e0cb6fcce2467cc9e19.png

1.2.7. Exercises#

Exercise 1.4

The equation \(x^3 -2x + 1 = 0\) can be written as a fixed point equation in many ways, including

  1. \(\displaystyle x = \frac{x^3 + 1}{2}\)
    and

  2. \(x = \sqrt[3]{2x-1}\)

For each of these options:

(a) Verify that its fixed points do in fact solve the above cubic equation.

(b) Determine whether fixed point iteration with it will converge to the solution \(r=1\). (assuming a “good enough” initial approximation).

Note: computational experiments can be a useful start, but prove your answers mathematically!

Exercise 1.5

Consider the task of computing the positive \(n\)-th root \(r = a^{1/n}\) of a positive number \(a\) by fixed-point iteration \(x_{k+1} = g(x_k)\). Two possible functions to use are

(1.3)#\[ g_1(x) = \frac{x + a/x^{n-1}}{2} \quad \text{ and } \quad g_2(x) = \frac{(n-1)x + a/x^{n-1}}{n}, \]

both generalizing the classic formula for computing square roots, \(\displaystyle g(x) = \frac{x + a/x}{2}\).

a) Verify that each does indeed have \(a^{1/n}\) as a fixed point.

b) Test each method for convergence (as usual, you may assume a sufficiently good initial approximation).

c) Your results above should identify one method as “superior” in that it has super-linear convergence: which one?

d)(*) Show that this superior method converges for any positive initial approximation, even though this is not guaranteed by the general theorems on fixed point iteration.