1.5. Measures of Error and Order of Convergence#
Last revised on August 13, 2024
References:
Section 1.3.1 Forward and backward error of [Sauer, 2022], on measures of error;
Section 2.4 Error Analysis for Iterative Methods of [Burden et al., 2016], on order of convergence.
These notes cover a number of topics:
Measures of error: absolute, relative, forward, backward, etc.
Measuring the rate of convergence of a sequence of approximations.
Big-O and little-o notation for describing how small a quantity (usually an error) is.
1.5.1. Error measures#
Several of these have been mentioned before, but they are worth gathering here.
Consider a quantity \(\tilde{x}\) considered as an approximation of an exact value \(x\). (This can be a number or a vector.)
(Error)
The error in \(\tilde{x}\) is \(\tilde{x} - x\) (or \(x - \tilde{x}\); different sources use both versions and either is fine so long as you are consistent.)
(Absolute Error)
The absolute error is the absolute value of the error: \(|\tilde{x} - x|\). For vector quantities this means the norm \(\| \tilde{x} - x \|\), and it can be any norm, so long as we again choose one and use it consistently. Two favorites are the Euclidean norm \(\| x \| = \sqrt{\sum |x_i|}\), denoted \(\| x \|_2\), and the maximum norm (also mysteriously known at the infinity norm):
For real-valued quantities, the absolute error is related to the number of correct decimal places: \(p\) decimal places of accuracy corresponds roughly to absolute error no more than \(0.5 \times 10^{-p}\), and when working with computer arithmetic, \(p\) corrrect bits corresponds to absolute error no more than \(2^{-(p+1)}\).
(Relative Error)
The relative error is the ratio of the absolute error to the size of the exact quantity: \(\displaystyle\frac{| \tilde{x} - x |}{| x |}\) (again possibly with vectors and vector norms).
This is often more relevant than absolute error for inherently positive quantities, but is obviously unwise where \(x = 0\) is a possibility.
For real-valued quantities, this is related to the number of significant digits: accuracy to \(p\) significant digits corresponds roughly to relative error no more than \(0.5 \times 10^{-p}\).
When working with computer arithmetic, \(p\) significant bits corresponds to relative error no more than \(2^{-(p+1)}\).
Backward error (and forward error)#
An obvious problem is that we usually do not know the exact solution \(x\), so cannot evaluate any of these; instead we typically seek upper bounds on the absolute or relative error. Thus, when talking of approximate solutions to an equation \(f(x) = 0\) the concept of backward error introduced in the section on Newton’s method can be very useful, for example as a step in getting bounds on the size of the error. To recap:
(Backward Error)
The backward error in \(\tilde{x}\) as an approxiate solution to the equation \(f(x) = 0\) is \(f(\tilde{x})\); the amount by which function \(f\) would have to be changed in order for \(\tilde{x}\) to be an exact root.
For the case of solving simultaneous linear equations in matrix-vector form \(A x = b\), this is \(b - A \tilde{x}\), also known as the residual.
(Absolute Backward Error)
The absolute backward error is — as you might have guessed — the absolute value of the backward error: \(|f(\tilde{x})|\). This is sometimes also called simply the backward error. (The usual disclaimer about vector quantities applies.)
For the case of solving simultaneous linear equations in matrix-vector form \(A x = b\), this is \(\| b - A \tilde{x} \|\), also known as the residual norm; we will see more of this in the section on Error bounds for linear algebra, etc..
One obvious advantage of the backward error concept is that you can actually evaluate it without knowing the exact solution \(x\).
Also, one significance of backward error is that if the values of \(f(x)\) are only known to be accurate within an absolute error of \(E\) then any approximation with absolute backward error less than \(E\) could in fact be exact, so there is no point in seeking greater accuracy.
The names forward error and absolute forward error are sometimes used as synonyms for error etc. as defined above, when they need to be distinguished from backward errors.
1.5.2. Order of convergence of a sequence of approximations#
(Linear Convergence)
We have seen that for the sequence of approximations \(x_k\) to a quantity \(x\) given by the fixed point iteration \(x_{k+1} = g(x_k)\), the absolute errors \(E_k := |x_k - x|\) typically have
so that eventually the errors diminsh in a roughly geometric fashion: \(E_k \approx K C^k\). This is called linear convergence.
Aside: Why “linear” rather than “geometric”? Because there is an approximately linear relationship between consecutive error values,
This is a very common behavior for iterative numerical methods, but we will also see that a few methods do even better; for example, when Newton’s method converges to a simple root \(r\) of \(f\) (one with \(f'(r) \neq 0\))
which is called quadratic convergence. More generally:
\(p\))
(Convergence of OrderThis is when
We have already observed experimentally the intermediate result that “\(C = 0\)” for Newton’s method in this case; that is,
(Super-linear Convergence)
When the successive errors behave as in Equation (1.10) the convergence is super-linear. This includes any situation with order of convergence \(p > 1\).
For most practical purposes, if you have established super-linear convergence, you can be happy, and not worry much about refinements like the particular order \(p\).
1.5.3. Big-O and little-o notation#
Consider the error formula for approximation of a function \(f\) with the Taylor polynomial of degree \(n\), center \(a\):
Since the coefficient of \(h^{n+1}\) is typicaly not known in practice, it is wise to focus on the power law part, and for this the “big-O” and little-o” notation is convenient.
If a function \(E(h)\) goes to zero at least as fast as \(h^p\), we say that it is of order \(h^p\), written \(O(h^p)\).
More precisely, \(E(h)\) is no bigger than a multiple of \(h^p\) for \(h\) small enough; that is, there is a constant \(C\) such that for some positive number \(\delta\)
Another way to say this is in terms of the lim-sup, if you have seen that jargon:
This can be used to rephrase the above Taylor’s theorem error bound as
or
and for the case of the linearization,
Little-o notation, for “negligibly small terms”#
Sometimes it is enough to say that some error term is small enough to be neglected, at least when \(h\) is close enough to zero. For example, with a Taylor series we might be able to neglect the powers of \(x-a\) or of \(h\) higher than \(p\).
We will thus say that a quantity \(E(h)\) is small of order \(h^p\), written \(o(h^p)\) when
Note the addition of the word small compared to the above description of the big-O case!
With this, the Taylor’s theorem error bound can be stated as
or
and for the case of the linearization,
1.5.4. Exercises#
Exercise 1#
a) Find the multiplicity of the root \(r = 0\) of the function \(f(x) = 1 - \cos x\).
b) Evaluate the forward and backward errors of the approximate root \(\tilde{r} = 0.001\).
Notes for Exercise 1: the multiplicity of a root
Recall the the multiplicity \(m\) of a root \(r\) of a polynomial is the number of factors \(x-r\) that it has: it has a factor \((x-r)^m\) but not \((x-r)^{m+1}\).
This can be extended to roots of other functions: one way is to note that at a root of multiplicity \(m\), the first \(m-1\) derivatives vanish: \(f^{(1)}(r) = \dots f^{(m-1)}(r) = 0\), but \(f^{(m)}(r) \neq 0\).
This is used as the more general definition of the multiplicity.
Aside: This is equivalent to having \(f(x) = (x-r)^m g(x)\) where \(g(x)\) is continuous at \(r\), and to the Taylor series for \(f\) with center \(r\) having the form $\(f(x) = c_m (x-r)^m + c_{m+1} (x-r)^{m+1} + \cdots.\)$
Note that we can say that Newton’s method has problems when the root being computed is of multiplicity greater than 1, and the above exercise examines one of these problems.