Simultaneous Linear Equations, Part 5: Error bounds for linear algebra, condition numbers, matrix norms, etc.
Contents
13. Simultaneous Linear Equations, Part 5: Error bounds for linear algebra, condition numbers, matrix norms, etc.#
References:
Section 2.3.1 Error Magnification and Condition Number of Sauer
Section 7.5 Error Bounds and Iterative Refinement of Burden&Faires — but the last part, on Iterative Refinement, is not relevant here.
Section 8.4 of Chenney&Kincaid
13.1. Residuals, backward errors, forward errors, and condition numbers#
For an approximation \(x_a\) of the solution \(x\) of \(A x = b\), the residual \(r = A x_a - b\) measures error as backward error, often measured by a single number, the residual norm \(\| A x_a - b \|\). Any norm could be used, but the maximum norm is usualt preferred, for reasons that we will see soon.
The corresponding (dimensionless) measure of relative error is defined as
However, these can greatly underestimate the forward errors in the solution: the absolute error \(\|x - x_a\|\) and relative error
To relate these to the residual, we need the concepts of a matrix norm and the condition number of a matrix.
13.2. Matrix norms induced by vector norms#
Given any vector norm \(\| \cdot \|\) — such as the maximum (“infinity”) norm \(\| \cdot \|_\infty\) or the Euclidean norm (length) \(\| \cdot \|_2\) — the correponding induced matrix norm is
This maximum exists for ethe rof these vector norms, and for the infinity norm there ia an explicit formula for it: for any \(m\times n\) matrix,
(On the other hand, it is far harder to compute the Euclidean norm of a matrix: the formula requires computing eigenvalues.)
Note that when the matrix is a vector considered as a matrix with a single column — so \(n=1\) — the sum goes away, and this agrees with the infinity vector norm. This allows us to consider vectors as being just matrices with a single column, which we will often do from now on.
13.3. Properties of (induced) matrix norms#
These induced matrix norms have many properties in common with Euclidean length and other vector norms, but there can also be products, and then one has to be careful.
\(\|A\| \geq 0\) (positivity)
\(\| A \| = 0\) if and only if \(A = 0\) (definiteness)
\(\| c A \| = |c| \, \|A\|\) for any constant \(c\) (absolute homogeneity)
\(\| A + B \| \leq \| A \| + \| B \|\) (sub-additivity or the triangle inequality),
and when the product of two matrices makes sense (including matrix-vector products),\(\| A B \| \leq \| A \| \, \| B \|\) (sub-multiplicativity)
Note the failure to always have equality with products. Indeed one can have \(A B = 0\) with \(A\) and \(B\) both non-zero, such as when \(A\) is a singular matrix and \(B\) is a null-vector for it.
Aside: There are other matrix norms of use in some contexts, in particular the Frobenius norm. Then the above properties are often used to define what it is to be a matrix form, much as the the first four define what it is to be a vector norm.
13.3.1. Julia Note: function norm
#
Julia package LinearAlgebra
provides the function norm
for evaluating matrix norms, as seen in the examples in the previous section
Solving Ax = b With Both Pivoting and LU Factorization.
The default usage norm(A)
computes \(\| A \|_2\), which one can also specify explicitly with norm(A, 2)
;
to get the maximum norm \(\| A \|_\infty\) one uses norm(A, Inf)
.
13.4. Relative error bound and condition number#
It can be proven that, for any choice of norm,
where the last factor is the relative backward error.
Since we can (though often with considerable effort, due to the inverse!) compute the right-hand side when the infinity norm is used, we can compute an upper bound on the relative error. From this, an upper bound on the absolute error can be computed if needed.
The growth factor between the relative backward error measured by the residual and the relative (forward) error is called the condition number, \(K(A)\):
Actually there is one condition number for each choice of norm, so we work with
Note that for a singular matrix, this is undefined: we can intuitively say that the condition number is then infinite.
At the other extreme, the identity matrix \(I\) has norm 1 and condition number 1 (using any norm), and this is the best possible because in general \(\kappa(A) \geq 1\). (This follows from sub-multiplicativity.)
13.4.1. Aside: estimating \(\| A^{-1} \|_\infty\) and thence the condition number#
In Julia, good approximations of condition numbers are given by the function cond
from package LinearAlgebra
.
As with function norm()
, the simple form cond(A)
defaults to \(\kappa_2(A)\) based on the Euclidian length \(\| \cdot \|_2\) for vectors;
to get the infinity norm version \(\kappa_\infty(A)\) use cond(A, Inf)
.
This is not done exactly, since computing the inverse is a lot of work for large matrices and good estimates can be got far more quickly. The basic idea is start with the formula
and instead compute the maximum over some finite selection of values for \(x\): call them \(x^{(k)}\). Then to evaluate \(y^{(k)} = A ^{-1} x^{(k)}\), express this through the equation \(A y^{(k)} = x^{(k)}\). Once we have an LU factorization for \(A\) (which one probably would have when exploring errors in a numerical solution of \(Ax = b\)) each of these systems can be solved relatively fast: Then
13.5. Well-conditioned and ill-conditioned problems and matrices#
Condition numbers, giving upper limit on the ratio of forward error to backward error, measure the amplification of errors, and have counterparts in other contexts. For example, with an approximation \(r_a\) of a root \(r\) of the equation \(f(x) = 0\), the ratio of forward error to backward error is bounded by \(\displaystyle \max 1/| f'(x) | = \frac{1}{\min | f'(x) |}\), where the maximum only need be taken over an interval known to contain both the root and the approximation. This condition number becomes “infinite” for a multiple root, \(f'(r) = 0\), related to the problems we have seen in that case.
Careful calculation of an approximate solution \(x_a\) of \(Ax = b\) can often get a residual that is at the level of machine rounding error, so that roughly the relative backward error is of size comparable to the machine unit, \(u\). The condition number then guarantees that the (forward) relative error is no greater than about \(u \, \kappa(A)\).
In terms of significant bits, with \(p\) bit machine arithmetic, one can hope to get \(p - \log_2(\kappa(A))\) significant bits in the result, but can not rely on more, so one loses \(\log_2(\kappa(A))\) significant bits. Compare this to the observation that one can expect to lose at least \(p/2\) significant bits when using the approximation \(Df(x) \approx D_hf(x) - (f(x+h) = f(x))/h\).
A well-conditioned problem is one that is not too highly sensitive to errors in rounding or input data; for an eqution \(Ax = b\), this corresponds to the condition number of \(A\) not being to large; the matrix \(A\) is then sometimes also called well-conditioned. This is of course vague, but might typically mean that \(p - \log_2(\kappa(A))\) is a sufficient number of significant bits for a particular purpose.
A problem that is not deemed well-conditioned is called ill-conditioned, so that a matrix of uncomfortably large condition number is also sometimes called ill-conditioned. An ill-conditioned problem might still be well-posed, but just requiring careful and precise solution methods.
This work is licensed under Creative Commons Attribution-ShareAlike 4.0 International