Error bounds for linear algebra, condition numbers, matrix norms, etc.

2.6. Error bounds for linear algebra, condition numbers, matrix norms, etc.#

Last Revised on August 17, 2025, adding exercises.

References:

Section 2.3.1, Error Magnification and Condition Number, of Sauer [Sau22].
Section 7.5, Error Bounds and Iterative Refinement, of Burden, Faires, and Burden [BFB16] — but you may skip the last part, on Iterative Refinement; that is not relevant here.
Section 4.4, Error estimates, of Dionne [Dio23].
Section 8.4, Iterative Solution of Linear Systems, of Chenney and Kincaid [CK13] — just the first part.

# In addition to the usual
import numpy as np
# Some additional imports are done (and explained) below.

2.6.1. Residuals, backward errors, forward errors, and condition numbers#

For an approximation \(x_a\) of the solution \(x\) of \(A x = b\), the residual \(r = A x_a - b\) measures error as backward error, often measured by a single number, the residual norm \(\| A x_a - b \|\). Any norm could be used, but the maximum norm is usualt preferred, for reasons that we will see soon.

The corresponding (dimensionless) measure of relative error is defined as

\[\frac{\|r\|}{\|b\|}.\]

However, these can greatly underestimate the forward errors in the solution: the absolute error \(\|x - x_a\|\) and relative error

\[Rel(x_a) = \frac{\|x - x_a\|}{\| x \|}\]

To relate these to the residual, we need the concepts of a matrix norm and the condition number of a matrix.

2.6.2. Matrix norms induced by vector norms#

Definition 2.7

\[ \| A \| := \max_{x \neq 0} \frac{\| Ax \|}{\| x \|}, = \max_{\|x\|=1} \| Ax \| \]

Note that when the matrix has a single column, so can be considered as a column vector, the “\(x\)” is just a number, and then the value of the induced matrix norm is the same as with the corresponding vector norm. This is another reason why we will in general think of vectors in column form, at least when working with linear algebra.

This maximum exists for any vector norm, and for the infinity norm there is an explicit formula for it; for any \(m\times n\) matrix:

Theorem 2.5

(2.4)#\[ \|A\|_\infty = \max_{i=1}^m \sum_{j=1}^n |a_{ij}| \]

(Proof omitted for now.)

On the other hand, it is far harder to compute the Euclidean norm of a matrix \(\|A\|_2\): the formula then requires computing eigenvalues.

2.6.3. Properties of (induced) matrix norms#

These induced matrix norms have many properties in common with Euclidean length and other vector norms, but there can also be products, and then one has to be careful.

\(\|A\| \geq 0\) (positivity)
\(\| A \| = 0\) if and only if \(A = 0\) (definiteness)
\(\| c A \| = |c| \, \|A\|\) for any constant \(c\) (absolute homogeneity)
\(\| A + B \| \leq \| A \| + \| B \|\) (sub-additivity or the triangle inequality),
and when the product of two matrices makes sense (including matrix-vector products),
\(\| A B \| \leq \| A \| \, \| B \|\) (sub-multiplicativity)

Note the failure to always have equality with products. Indeed one can have \(A B = 0\) with \(A\) and \(B\) both non-zero, such as when \(A\) is a singular matrix and \(B\) is a null-vector for it.

Remark 2.23 (Other matrix norms)

There are other matrix norms of use in some contexts, in particular the Frobenius norm. Then the above properties are often used to define what it is to be a matrix form, much as the first four define what it is to be a vector norm.

Remark 2.24 (numpy.linalg.norm)

Python package Numpy provides the function numpy.linalg.norm for evaluating matrix norms. The default usage numpy.linalg.norm(A) computes \(\| A \|_2\), which one can also specify explicitly with numpy.linalg.norm(A, 2); to get the maximum norm \(\| A \|_\infty\), one uses numpy.linalg.norm(A, numpy.inf).

2.6.4. Relative error bound and condition number#

Theorem 2.6

For any choice of norm,

(2.5)#\[ \text{Rel}(x_a) = \frac{\|x - x_a\|}{\| x \|} \leq \|A\|\|A^{-1}\| \frac{\|r\|}{\|b\|}, \]

where the last factor \(\displaystyle \frac{\|r\|}{\|b\|}\) is the relative backward error.

Proof. Firstly, \(r = b - A x_a = A x - A x_a = A(x-x_a)\), so

\[ x - x_a = A^{-1} r \]

and using property (5) of the norm,

(2.6)#\[ \|x - x_a\| \leq \|A^{-1}\| \|r\| \]

Next, from \(b = A x\), the same inequality gives \(\|b\| \leq \|A\| \|x\|\) so solving for \(1/\|x\|\),

(2.7)#\[ \frac{1}{\|x\|} \leq \frac{\|A\|}{\|b\|} \]

The product of the two inequalities (2.6) and (2.7) gives the result.

Since we can (though often with considerable effort, due to the inverse!) compute the right-hand side when the infinity norm is used, we can compute an upper bound on the relative error. From this, an upper bound on the absolute error can be computed if needed.

The growth factor between the relative backward error measured by the residual and the relative (forward) error is called the condition number, \(\kappa(A)\):

Definition 2.8

The condition number of square matrix \(A\) is

\[\kappa(A) := \|A\| \|A^{-1}\|\]

Thus, the above bound on the relative error can be restated as

(2.8)#\[ \text{Rel}(x_a) = \frac{\|x - x_a\|}{\| x \|} \leq \kappa(A) \frac{\|r\|}{\|b\|} \]

Actually there is a different condition number for each choice of norm; we work with

\[\kappa_\infty(A) := \|A\|_\infty \|A^{-1}\|_\infty\]

Note that for a singular matrix, this is undefined: we can intuitively say that the condition number is then infinite.
At the other extreme, the identity matrix \(I\) has norm 1 and condition number 1 (using any norm), and this is the best possible because in general \(\kappa(A) \geq 1\). (This follows from property 5, sub-multiplicativity.)

2.6.4.1. Aside: estimating \(\|A^{-1}\|_\infty\) and thence the condition number, and `numpy.linalg.cond`#

In Python, good approximations of condition numbers are given by the function numpy.linalg.cond.
As with numpy.linalg.norm, the default numpy.linalg.cond(A) gives \(\kappa_2(A)\), based on the Euclidian length \(\| \cdot \|_2\) for vectors; to get the infinity norm version \(\kappa_\infty(A)\) use numpy.linalg.cond(A, numpy.inf).

This is not done exactly, since computing the inverse is a lot of work for large matrices, and good estimates can be got far more quickly. The basic idea is start with the formula

\[\| A^{-1} \| = \max_{\|x\|=1} \| A ^{-1} x \|\]

and instead compute the maximum over some finite selection of values for \(x\): call them \(x^{(k)}\). Then to evaluate \(y^{(k)} = A ^{-1} x^{(k)}\), express this through the equation \(A y^{(k)} = x^{(k)}\). Once we have an LU factorization for \(A\) (which one probably would have when exploring errors in a numerical solution of \(Ax = b\)) each of these systems can be solved relatively fast: Then

\[\| A^{-1} \| \approx \max_k \| y^{(k)} \|.\]

2.6.5. Well-conditioned and ill-conditioned problems and matrices#

Condition numbers, giving upper limit on the ratio of forward error to backward error, measure the amplification of errors, and have counterparts in other contexts. For example, with an approximation \(r_a\) of a root \(r\) of the equation \(f(x) = 0\), the ratio of forward error to backward error is bounded by \(\displaystyle \max 1/| f'(x) | = \frac{1}{\min | f'(x) |}\), where the maximum only need be taken over an interval known to contain both the root and the approximation. This condition number becomes “infinite” for a multiple root, \(f'(r) = 0\), related to the problems we have seen in that case.

Careful calculation of an approximate solution \(x_a\) of \(Ax = b\) can often get a residual that is at the level of machine rounding error, so that roughly the relative backward error is of size comparable to the machine unit, \(u\). The condition number then guarantees that the (forward) relative error is no greater than about \(u \, \kappa(A)\).

In terms of significant bits, with \(p\) bit machine arithmetic, one can hope to get \(p - \log_2(\kappa(A))\) significant bits in the result, but can not rely on more, so one loses \(\log_2(\kappa(A))\) significant bits.

Remark 2.25

Compare this to the observation in Example 2.5 that one can expect to lose at least \(p/2\) significant bits when using the approximation \(Df(x) \approx D_hf(x) - (f(x+h) = f(x))/h\).

A well-conditioned problem is one that is not too highly sensitive to errors in rounding or input data; for an eqution \(Ax = b\), this corresponds to the condition number of \(A\) not being to large; the matrix \(A\) is then sometimes also called well-conditioned. This is of course vague, but might typically mean that \(p - \log_2(\kappa(A))\) is a sufficient number of significant bits for a particular purpose.

A problem that is not deemed well-conditioned is called ill-conditioned, so that a matrix of uncomfortably large condition number is also sometimes called ill-conditioned. An ill-conditioned problem might still be well-posed, but just requiring careful and precise solution methods.

Example 2.6 (the Hilbert matrices)

The \(n \times n\) Hilbert matrix \(H_n\) has elements

\[H_{i, j} = \frac{1}{i+j-1}\]

For example

\[\begin{split}H_4 = \left[ \begin{array}{cccc} 1 & 1/2 & 1/3 & 1/4 \\ 1/2 & 1/3 & 1/4 & 1/5 \\1/3 & 1/4 & 1/5 & 1/6 \\1/4 & 1/5 & 1/6 & 1/7 \end{array} \right]\end{split}\]

and for larger or smaller \(n\), one simply adds or remove rows below and columns at right.

These matrices arise in important situations like finding the polynomial of degree \(n-1\) that fits given data in the sense of minimizing the root-mean-square error — as we will discuss later in this course if there is time and interest.

Unfortunately as \(n\) increases the condition number grows rapidly, causing severe rounding error problems. To illustrate this, I will do something that one should usually avoid: compute the inverse of these matrices. This is also a case that shows the advatage of the LU factorization, since one computes the inverse by succesively computing each column, by solving \(n\) different systems of equations, each with the same matrix \(A\) on the left-hand side.

import numpy as np
from numpy import inf
from numerical_methods import lu_factorize, forward_substitution, backward_substitution, solve_linear_system
from numpy.linalg import norm, cond
from numpy.random import random

def inverse(A):
    """Use sparingly; there is usually a way to avoid computing inverses that is faster and with less rounding error!"""
    n = len(A)
    A_inverse = np.zeros_like(A)
    (L, U) = lu_factorize(A)
    for i in range(n):
        b = np.zeros(n)
        b[i] = 1.0
        c = forward_substitution(L, b)
        A_inverse[:,i] = backward_substitution(U, c)
    return A_inverse

def hilbert(n):
    H = np.zeros([n,n])
    for i in range(n):
        for j in range(n):
            H[i,j] = 1.0/(1.0 + i + j)
    return H

for n in range(2,6):
    H_n = hilbert(n)
    print(f"H_{n} is")
    print(H_n)
    H_n_inverse = inverse(H_n)
    print("and its inverse is")
    print(H_n_inverse)
    print("to verify, their product is")
    print(H_n @ H_n_inverse)
    print()

H_2 is
[[1.         0.5       ]
 [0.5        0.33333333]]
and its inverse is
[[ 4. -6.]
 [-6. 12.]]
to verify, their product is
[[ 1.00000000e+00  0.00000000e+00]
 [-3.70074342e-17  1.00000000e+00]]

H_3 is
[[1.         0.5        0.33333333]
 [0.5        0.33333333 0.25      ]
 [0.33333333 0.25       0.2       ]]
and its inverse is
[[   9.  -36.   30.]
 [ -36.  192. -180.]
 [  30. -180.  180.]]
to verify, their product is
[[ 1.00000000e+00  9.62193288e-16  1.40628250e-15]
 [-8.88178420e-16  1.00000000e+00  0.00000000e+00]
 [-2.22044605e-17 -1.99840144e-15  1.00000000e+00]]

H_4 is
[[1.         0.5        0.33333333 0.25      ]
 [0.5        0.33333333 0.25       0.2       ]
 [0.33333333 0.25       0.2        0.16666667]
 [0.25       0.2        0.16666667 0.14285714]]
and its inverse is
[[   16.  -120.   240.  -140.]
 [ -120.  1200. -2700.  1680.]
 [  240. -2700.  6480. -4200.]
 [ -140.  1680. -4200.  2800.]]
to verify, their product is
[[ 1.00000000e+00  0.00000000e+00  2.27373675e-13 -1.13686838e-13]
 [-1.33226763e-16  1.00000000e+00  1.12532206e-13 -1.50812696e-13]
 [-2.25745348e-15  2.23524902e-14  1.00000000e+00 -4.48530102e-14]
 [-3.96508223e-15  6.78822078e-14 -6.41391702e-14  1.00000000e+00]]

H_5 is
[[1.         0.5        0.33333333 0.25       0.2       ]
 [0.5        0.33333333 0.25       0.2        0.16666667]
 [0.33333333 0.25       0.2        0.16666667 0.14285714]
 [0.25       0.2        0.16666667 0.14285714 0.125     ]
 [0.2        0.16666667 0.14285714 0.125      0.11111111]]
and its inverse is
[[ 2.500e+01 -3.000e+02  1.050e+03 -1.400e+03  6.300e+02]
 [-3.000e+02  4.800e+03 -1.890e+04  2.688e+04 -1.260e+04]
 [ 1.050e+03 -1.890e+04  7.938e+04 -1.176e+05  5.670e+04]
 [-1.400e+03  2.688e+04 -1.176e+05  1.792e+05 -8.820e+04]
 [ 6.300e+02 -1.260e+04  5.670e+04 -8.820e+04  4.410e+04]]
to verify, their product is
[[ 1.00000000e+00  4.05808720e-13  2.08468798e-12 -1.70681247e-12
   8.53406235e-13]
 [ 3.20669417e-14  1.00000000e+00  6.88079223e-13 -3.96645679e-13
   1.41098244e-12]
 [ 2.34257058e-14 -3.00077423e-14  1.00000000e+00  2.77828554e-12
   9.49557893e-13]
 [-1.42108547e-14  4.54747351e-13 -2.72848411e-12  1.00000000e+00
  -9.09494702e-13]
 [-1.65176514e-14  3.05089287e-13 -1.96659972e-12  3.37354835e-12
   1.00000000e+00]]

Note how the inverses have some surprisingly large elements; this is the matrix equivalent of a number being very close to zero and so with a very large reciprocal.

Since we have the inverses, we can compute the matrix norms of each \(H_n\) and its inverse, and thence their condition numbers; then this can be compared to the approximations of these condition numbers given by numpy.linalg.cond

for n in range(2,6):
    H_n = hilbert(n)
    print(f"H_{n} is")
    print(H_n)
    print(f"with infinity norm {norm(H_n, inf)}") 
    H_n_inverse = inverse(H_n)
    print("and its inverse is")
    print(H_n_inverse)
    print(f"with infinity norm {norm(H_n_inverse, inf)}") 
    print(f"Thus the condition number of H_{n} is {norm(H_n, inf) * norm(H_n_inverse, inf)}")
    print(f"For comparison, numpy.linalg.cond gives {cond(H_n, inf)}")
    print()

H_2 is
[[1.         0.5       ]
 [0.5        0.33333333]]
with infinity norm 1.5
and its inverse is
[[ 4. -6.]
 [-6. 12.]]
with infinity norm 18.000000000000007
Thus the condition number of H_2 is 27.00000000000001
For comparison, numpy.linalg.cond gives 27.00000000000001

H_3 is
[[1.         0.5        0.33333333]
 [0.5        0.33333333 0.25      ]
 [0.33333333 0.25       0.2       ]]
with infinity norm 1.8333333333333333
and its inverse is
[[   9.  -36.   30.]
 [ -36.  192. -180.]
 [  30. -180.  180.]]
with infinity norm 408.00000000000165
Thus the condition number of H_3 is 748.000000000003
For comparison, numpy.linalg.cond gives 748.0000000000028

H_4 is
[[1.         0.5        0.33333333 0.25      ]
 [0.5        0.33333333 0.25       0.2       ]
 [0.33333333 0.25       0.2        0.16666667]
 [0.25       0.2        0.16666667 0.14285714]]
with infinity norm 2.083333333333333
and its inverse is
[[   16.  -120.   240.  -140.]
 [ -120.  1200. -2700.  1680.]
 [  240. -2700.  6480. -4200.]
 [ -140.  1680. -4200.  2800.]]
with infinity norm 13619.999999998772
Thus the condition number of H_4 is 28374.99999999744
For comparison, numpy.linalg.cond gives 28374.999999997905

H_5 is
[[1.         0.5        0.33333333 0.25       0.2       ]
 [0.5        0.33333333 0.25       0.2        0.16666667]
 [0.33333333 0.25       0.2        0.16666667 0.14285714]
 [0.25       0.2        0.16666667 0.14285714 0.125     ]
 [0.2        0.16666667 0.14285714 0.125      0.11111111]]
with infinity norm 2.283333333333333
and its inverse is
[[ 2.500e+01 -3.000e+02  1.050e+03 -1.400e+03  6.300e+02]
 [-3.000e+02  4.800e+03 -1.890e+04  2.688e+04 -1.260e+04]
 [ 1.050e+03 -1.890e+04  7.938e+04 -1.176e+05  5.670e+04]
 [-1.400e+03  2.688e+04 -1.176e+05  1.792e+05 -8.820e+04]
 [ 6.300e+02 -1.260e+04  5.670e+04 -8.820e+04  4.410e+04]]
with infinity norm 413279.99999998364
Thus the condition number of H_5 is 943655.9999999626
For comparison, numpy.linalg.cond gives 943655.9999995853

Next, experiment with solving equations, to compare residuala with actual errors.

I will use the testing strategy of starting with a known solution \(x\), from which the right-hand side \(b\) is computed; then slight simulated error is introduced to \(b\). Running this repeatedly with use of different random “errors” gives an idea of the actual error.

for n in range(2,6):
    print(f"{n=}")
    H_n = hilbert(n)
    x = np.linspace(1.0, n, n)
    print(f"x is {x}")
    b = H_n @ x
    print(f"b is {b}")
    error_scale = 1e-8
    b_imperfect = b + 2.0 * error_scale * (random(n) - 0.5) # add random "errors" between -error_scale and error_scale
    print(f"b has been slightly changed to {b_imperfect}")
    x_computed = solve_linear_system(H_n, b_imperfect)
    residual = b - H_n @ x_computed
    relative_backward_error = norm(residual, inf)/norm(b, inf)
    print(f"The residual maximum norm is {norm(residual, inf)}")
    print(f"and the relative backward error ||r||/||b|| is {relative_backward_error:0.4}")
    absolute_error = norm(x - x_computed, inf)
    relative_error = absolute_error/norm(x, inf)
    print(f"The absolute error is {absolute_error:0.4}")
    print(f"The relative error is {relative_error:0.4}")
    error_bound = cond(H_n, inf) * relative_backward_error
    print(f"For comparison, the relative error bound from the formula above is {error_bound:0.4}")
    print(f"\nBeware: the relative error is larger than the relative backward error by a factor {relative_error/relative_backward_error:0.8}")
    print()

n=2
x is [1. 2.]
b is [2.         1.16666667]
b has been slightly changed to [2.         1.16666667]
The residual maximum norm is 2.156431921918056e-09
and the relative backward error ||r||/||b|| is 1.078e-09
The absolute error is 2.044e-08
The relative error is 1.022e-08
For comparison, the relative error bound from the formula above is 2.911e-08

Beware: the relative error is larger than the relative backward error by a factor 9.4771775

n=3
x is [1. 2. 3.]
b is [3.         1.91666667 1.43333333]
b has been slightly changed to [3.00000001 1.91666667 1.43333334]
The residual maximum norm is 7.862964945815065e-09
and the relative backward error ||r||/||b|| is 2.621e-09
The absolute error is 7.92e-07
The relative error is 2.64e-07
For comparison, the relative error bound from the formula above is 1.96e-06

Beware: the relative error is larger than the relative backward error by a factor 100.71901

n=4
x is [1. 2. 3. 4.]
b is [4.         2.71666667 2.1        1.72142857]
b has been slightly changed to [4.00000001 2.71666666 2.1        1.72142857]
The residual maximum norm is 6.246541062182587e-09
and the relative backward error ||r||/||b|| is 1.562e-09
The absolute error is 2.91e-05
The relative error is 7.276e-06
For comparison, the relative error bound from the formula above is 4.431e-05

Beware: the relative error is larger than the relative backward error by a factor 4659.1067

n=5
x is [1. 2. 3. 4. 5.]
b is [5.         3.55       2.81428571 2.34642857 2.01746032]
b has been slightly changed to [4.99999999 3.55       2.81428571 2.34642857 2.01746032]
The residual maximum norm is 9.828756475371847e-09
and the relative backward error ||r||/||b|| is 1.966e-09
The absolute error is 0.000155
The relative error is 3.1e-05
For comparison, the relative error bound from the formula above is 0.001855

Beware: the relative error is larger than the relative backward error by a factor 15768.523

We see in these experiments that:

As the condition number increases, the relative error becomes increasingly larger than the backward error computed from the residual.
It is less than the above bound \(\displaystyle \text{Rel}(x_a) = \frac{\|x - x_a\|}{\| x \|} \leq \kappa(A) \frac{\|r\|}{\|b\|},\) and typically quite a bit less.

2.6.6. Exercises#

Exercise 2.15

This is a followup to various exercises in the section on Solving Ax = b with LU factorization, A = L U, based on the equations (2.3).

For the matrix \(A\) found in Exercise 2.10, its inverse found in Exercise 2.12, the approximate solution \(x_a\) of those equations also found in exercise Exercise 2.10, and the “exact” (far more accurate) solution \(x\) found in Exercise 2.11:

Compute the condition number \(\kappa_\infty(A)\). Do this by hand using the formula (2.4) for the matrix norm \(\| \cdot \|_\infty\), not with any Numpy functions.
Compute the theoretical upper bound on (forward) relative error in \(x_a\) given in terms of condition number and relative backward error, and compare to the actual relative error.