8. Solving Simultaneous Linear Equations, Part 1: Row Reduction/Gaussian Elimination#

References:

The basic strategy for solving simultaneous linear equations is row reduction and backward substitution, sometimes known as naive Gaussian elimination.

8.1. Introduction#

The problem of solving a system of \(n\) simultaneous linear equations in \(n\) unknowns, with matrix-vector form \(A x = b\), is quite thoroughly understood as far as having a good general-purpose methods usable with any \(n \times n\) matrix \(A\): essentially, Gaussian elimination (or row-reduction) as seen in most linear algebra courses, combined with some modifications to stay well away from division by zero: partial pivoting. Also, good robust software for this general case is readily available, for example in the Python packages NumPy and SciPy, or in Matlab.

Nevertheless, this basic algorithm can be very slow when \(n\) is large – as it often is when dealing with differential equations (even more so with partial differential equations). We will see that it requires about \(n^3/3\) arithmetic operations.

Thus I will summarise the basic method of row reduction or Gaussian elimination, and then build on it with methods for doing things more robustly, and on methods for doing it faster in some important special cases:

  1. When one has to solve many systems \(A x^{(m)} = b^{(m)}\) with the same matrix \(A\) but different right-hand side vectors \(b^{(m)}.\)

  2. When \(A\) is banded: most elements are zero, and all the non-zero elements \(a_{i,j}\) are near the main diagonal: \(|i - j|\) is far less than \(n\). (Aside on notation: “far less than” is sometimes denoted \(\ll\), as in \(|i-j| \ll n\).)

  3. When \(A\) is strictly diagonally dominant: each diagonal element \(a_{i,i}\) is larger in magnitude that the sum of the magnitudes of all other elements in the same row.

We might also explore some further topics, perhaps as individual projects:

  1. When \(A\) is positive definite: symmetric (\(a_{i,j} = a_{j,i}\)) and with all eigenvalues positive. This last condition would seem hard to verify, since computing all the eigenvalues of \(A\) is harder that solving \(Ax = b\), but there are important situations where this property is automatically guaranteed, such as with Galerkin and finite-element methods for solving boundary value problems for differential equations.

  2. When \(A\) is sparse: most elements are zero, but not necessarily with all the non-zero elements near the main diagonal.

8.2. Strategy for getting from mathematical facts to a good algorithm and then to its implentation in [Julia] code#

Here I take the opportunity to illustrate some useful strategies for getting from mathematical facts and ideas to good algorithms and working code for solving a numerical problem. The pattern we will see here, and often later, is:

8.2.1. Step 1. Get a basic algorithm:#

  1. Start with mathematical facts (like the equations \(\sum_{j=1}^n a_{ij}x_j = b_i\)).

  2. Solve to get an equation for each unknown — or for an updated aproximation of each unknown — in terms of other quantitities.

  3. Specify an order of evaluation in which all the quantities at right are evaluated earlier.

In this, it is often best to start with a verbal description before specifying the details in more precise and detailed mathematical form.

8.2.2. Step 2. Refine to get a more robust algorithm:#

  1. Identify cases that can lead to failure due to division by zero and such, and revise to avoid them.

  2. Avoid inaccuracy due to problems like severe rounding error. One rule of thumb is that anywhere that a zero value is a fatal flaw (in particular, division by zero), a very small value is also a hazard when rounding error is present. So avoid very small denominators. (We will soon examine this through the phenomenon of loss of significance and it extreme case catastrophic cancellation.)

8.2.3. Step 3. Refine to get a more efficient algorithm#

For example,

  • Avoid repeated evaluation of exactly the same quantity.

  • Avoid redundant calculations, such as ones whose value can be determnied in advance; for example, values that can be shown in advance to be zero.

  • Compare and choose between alternative algorithms.

8.3. Gaussian Elimination, a.k.a. Row Reduction#

We start by considering the most basic algorithm, based on ideas seen in a linear algebra course.

The problem is best stated as a collection of equations for individual numerical values:

Given coefficients \(a_{i,j} 1 \leq i \leq n,\, 1 \leq j \leq n\) and right-hand side values \(b_i,\, 1 \leq i \leq n\), solve for the \(n\) unknowns \(x_j,\, 1 \leq j \leq n\) in the equations $\( \sum_{j=1}^n a_{i,j} x_j = b_i,\, 1 \leq i \leq n. \)$

In verbal form, the basic strategy of row reduction or Gaussian elimination is this:

  • Choose one equation and use it to eliminate one chosen unknown from all the other equations, leaving that chosen equation plus \(n-1\) equations in \(n-1\) unknowns.

  • Repeat recursively, at each stage using one of the remaining equations to eliminate one of the remaining unknowns from all the other equations.

  • This gives a final equation in just one unknown, preceeded by an equation in that unknown plus one other, and so on: solve them in this order, from last to first.

8.3.1. Determining those choices, to produce a first algorithm: “Naive Gaussian Elimination”#

A precise algorithm must include rules specifying all the choices indicated above. The simplest “naive” choice, which works in most but not all cases, is to eliminate from the top to bottom and left to right:

  • Use the first equation to eliminate the first unknown from all other equations.

  • Repeat recursively, at each stage using the first remaining equation to eliminate the first remaining unknown. Thus, at step \(k\), equation \(k\) is used to eliminate unknown \(x_k\).

  • This gives one equation in just the last unknown \(x_n\); another equation in the last two unknowns \(x_{n-1}\) and \(x_n\), and so on: solve them in this reverse order, evaluating the unknowns from last to first.

This usually works, but can fail because at some stage the (updated) \(k\)-th equation might not include the \(k\)-th unknown: that is, its coefficient might be zero, leading to division by zero.

We will refine the algorithm to deal with that in the later section Solving Simultaneous Linear Equations, Part 2: Partial Pivoting.

8.4. The general case of solving \(Ax = b\)#

The problem of solving \(Ax = b\) in general, when all you know is that \(A\) is an \(n \times n\) matrix and \(b\) is an \(n\)-vector, can in most cases be handled well by using standard software rather than by writing your own code. Here is an example in Julia, solving

\[\begin{split} \left[ \begin{array}{rrr} 4 & 2 & 7 \\ 3 & 5 & -6 \\ 1 & -3 & 2 \end{array} \right] \left[ \begin{array}{r} x_1 \\ x_2 \\ x_3 \end{array} \right] = \left[ \begin{array}{r} 2 \\ 3 \\ 4 \end{array} \right] \end{split}\]
A = [4. 2. 7.; 3. 5. -6.; 1. -3. 2.]
println("A =\n$(A)")
b = [2.; 3.; 4.]
println("b = $(b)")
println("A*b = $(A*b)")
A =
[4.0 2.0 7.0; 3.0 5.0 -6.0; 1.0 -3.0 2.0]
b = [2.0, 3.0, 4.0]
A*b = [42.0, -3.0, 1.0]

8.4.1. Programming Note#

In some programing languages it can be important to specify that the entries are real numbers (type “float”); otherwise one might get calculations done with integer arithmetic, as in “4/3 = 1”.

One way to do this is as above: putting a decimal point in the numbers.

Julia mimics Matlab notation for “dividing from the left: the solution of \(Ax = b\) can be written as \(x = A^{-1} b\), but is not \(b A^{-1}\), which is what you get from the usual “divide from the right” notation of b/A.

x = A\b;

Aside: see what goes wrong here:

dividing_on_the_wrong_side = b/A
DimensionMismatch("Both inputs should have the same number of columns")

Stacktrace:
 [1] /(A::Vector{Float64}, B::Matrix{Float64})
   @ LinearAlgebra /Applications/Julia-1.7.2.app/Contents/Resources/julia/share/julia/stdlib/v1.7/LinearAlgebra/src/generic.jl:1149
 [2] top-level scope
   @ In[3]:1
 [3] eval
   @ ./boot.jl:373 [inlined]
 [4] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
   @ Base ./loading.jl:1196
println("Julia says that the solution of Ax=b is x=$(x)")
Julia says that the solution of Ax=b is x=[1.8116883116883116, -1.0324675324675323, -0.45454545454545453]
# Check the backward error, also known as the residual
r = b-A*x;
println("\nAs a check, the residual (or backward error) is\n    r = b - Ax = $(r)")
println("and its infinity (or 'maximum') norm is\n    ||r|| = $(maximum(abs.(r)))")
As a check, the residual (or backward error) is
    r = b - Ax = [0.0, 0.0, 8.881784197001252e-16]
and its infinity (or 'maximum') norm is
    ||r|| = 8.881784197001252e-16

8.5. The naive Gaussian elimination algorithm, in pseudo-code#

Here the elements of the transformed matrix and vector after step \(k\) are named \(a_{i,j}^{(k)}\) and \(b_{k}^{(k)}\), so that the original values are \(a_{i,j}^{(0)} = a_{i,j}\) and \(b_{i}^{(0)} = b_{i}\).

The name \(l_{i,k}\) is given to the multiple of row \(k\) that is subtracted from row \(i\) at step \(k\). This naming might seem redundant, but it becomes very useful later.

for k from 1 to n-1 \(\qquad\) Step k: get zeros in column k below row k:
\(\quad\) for i from k+1 to n
\(\qquad\) Evaluate the multiple of row k to subtract from row i:
\(\quad\quad l_{i,k} = a_{i,k}^{(k-1)}/a_{k,k}^{(k-1)}\) \(\qquad\) If \(a_{k,k}^{(k-1)} \neq 0\)!
\(\qquad\) Subtract \((l_{i,k}\) times row k) from row i in matrix A …:
\(\quad\quad\) for j from 1 to n
\(\quad\quad\quad a_{i,j}^{(k)} = a_{i,j}^{(k-1)} - l_{i,k} a_{k,j}^{(k-1)}\)
\(\quad\quad\) end
\(\qquad\) … and at right, subtract \((l_{i,k}\) times \(b_k)\) from \(b_i\):
\(\quad\quad b_i^{(k)} = b_i^{(k-1)} - l_{i,k} b_{k}^{(k-1)}\)
\(\quad\) end

The rows before \(i=k\) are unchanged, so they are ommited from the update; however, in a situation where we need to complete the definitions of \(A^{(k)}\) and \(b^{(k)}\) we would also need the following inside the for k loop:

\(\quad\) for i from 1 to k
\(\quad\quad\) for j from 1 to n
\(\quad\quad\quad a_{i,j}^{(k)} = a_{i,j}^{(k-1)}\)
\(\quad\quad\) end
\(\quad\quad b_i^{(k)} = b_i^{(k-1)}\)
\(\quad\) end

However, the algorithm will usually be implemented by overwriting the previous values in an array with new ones, and then this part is redundant.

The next improvement in efficiency: the updates in the first \(k\) columns at step \(k\) give zero values (that is the key idea of the algorithm!), so there is no need to compute or store those zeros, and thus the only calculations needed in the above for j from 1 to n loop are covered by for j from k+1 to n. Thus from now on we use only the latter: except when, for demonstration purposes, we need those zeros.

Thus, the standard algorithm looks like this:

for k from 1 to n-1 \(\qquad\) Step k: Get zeros in column k below row k:
\(\quad\) for i from k+1 to n \(\qquad\) Update only the rows that change: from k+1 on:
\(\qquad\) Evaluate the multiple of row k to subtract from row i:
\(\quad\quad l_{i,k} = a_{i,k}^{(k-1)}/a_{k,k}^{(k-1)}\) \(\qquad\) If \(a_{k,k}^{(k-1)} \neq 0\)!
\(\qquad\) Subtract \((l_{i,k}\) times row k) from row i in matrix A, in the columns that are not automaticaly zero:
\(\quad\quad\) for j from k+1 to n
\(\quad\quad\quad a_{i,j}^{(k)} = a_{i,j}^{(k-1)} - l_{i,k} a_{k,j}^{(k-1)}\)
\(\quad\quad\) end
\(\qquad\) and at right, subtract \((l_{i,k}\) times \(b_k)\) from \(b_i\):
\(\quad\quad b_i^{(k)} = b_i^{(k-1)} - l_{i,k} b_{k}^{(k-1)}\)
\(\quad\) end

8.6. The naive Gaussian elimination algorithm, in Julia#

Conversion to actual Julia code is now quite straightforward; there is litle more to be done than:

  • Change the way that indices are described, from \(b_i\) to b[i] and from \(a_{i,j}\) to A[i,j].

  • Use case consistently in array names, since the quirk in mathematical notation of using upper-case letters for matrix names but lower case letters for their elements is gone! In these notes, matrix names will be upper-case and vector names will be lower-case (even when a vector is considered as 1-column matrix).

  • Rather than create a new array for each matrix \(A^{(0)}\), \(A^{(0)}\), etc. and each vector \(b^{(0)}\), \(b^{(1)}\), we overwite each in the same array.

for k in 1:n
    for i in k+1:n
        L[i,k] = A[i,k] / A[k,k]
        for j in k+1:n
            A[i,j] -= L[i,k] * A[k,j]
        end
        b[i] -= L[i,k] * b[k]
    end
end

To demonstrate this, some additions are needed:

  • Putting this algorithm into a function.

  • Getting the value \(n\) needed for the loop, using the fact that it is the length of vector b.

  • Creating the array \(L\).

  • Copying the input arrays A and b into new ones, U and c, so that the original arrays are not changed. That is, when the row reduction is completed, U contains \(A^{(n-1)}\) and c contains \(b^{(n-1)}\).

Also, for some demonstrations, the zero values below the main diagonal of U are inserted, though usually they would not be needed.

function rowReduce(A, b)
    # To avoid modifying the matrix and vector specified as input,
    # they are copied to new arrays, with the function copy().
    # Warning: it does not work to say "U = A" and "c = b";
    # this makes these names synonyms, referring to the same stored data.

    U = copy(A)  # not "U=A", which makes U and A synonyms
    c = copy(b)
    n = length(b)
    L = zeros(n, n)
    for k in 1:n-1
        for i in k+1:n
            # compute all the L values for column k:
            L[i,k] = U[i,k] / U[k,k]  # Beware the case where U[k,k] is 0
            for j in k+1:n
                U[i,j] -= L[i,k] * U[k,j]
            end
            # Put in the zeros below the main diagonal in column k of U;
            # this is not important for calculations, since those elements of U are not used in backward substitution,
            # but it helps for displaying results and for checking the results via residuals.
            U[i,k] = 0.
            
            c[i] -= L[i,k] * c[k]
        end
    end
    for i in 2:n
        for j in 1:i-1
            U[i,j] = 0.
        end
    end
    return (U, c)
end;
function print_matrix(A)
    # A helper function to "pretty print" matrices
    (rows, cols) = size(A)
    println("[")
    for row in 1:rows
        for col in 1:cols
            print(A[row,col], " ")
        end
        println()
    end
    println("]")
end;

Note: The above two functions are gathered—along with many others—in a module for this course named NumericalMethodsAndAnalysis. Module NumericalMethodsAndAnalysis. The creation and use of modules will be discussed in the next section, Solving Simultaneous Linear Equations, Part 2: Partial Pivoting.

A = [4. 2. 7. ; 3. 5. -6. ; 1. -3. 2.]
println("A is")
print_matrix(A)
b = [2. ; 3. ; 4.]
println("b = $(b)")
A is
[
4.0 2.0 7.0 
3.0 5.0 -6.0 
1.0 -3.0 2.0 
]
b = [2.0, 3.0, 4.0]
(U, c) = rowReduce(A, b);
println("Row reduction gives U=")
print_matrix(U)
Row reduction gives U=
[
4.0 2.0 7.0 
0.0 3.5 -11.25 
0.0 0.0 -11.0 
]
println("c = $(c)")
c = [2.0, 1.5, 5.0]

Let’s take advantage of the fact that we have used Julia’s built-in linear algebra command b\A to get a very accurate approximation of the solution \(x\) to \(Ax=b\); this should also solve \(Ux=c\), so check the backward error, a.k.a. the residual:

r = c-U*x
println("The residual (backward error) r = c-Ux is $(r),\n    with maximum norm $(maximum(abs.(r)))")
The residual (backward error) r = c-Ux is [0.0, -2.220446049250313e-16, 0.0],
    with maximum norm 2.220446049250313e-16

8.6.1. Julia Note 1: Operations on a sequence of array indices, with “slicing”#

Julia code can specify vector operations on a range of indices \([c,d]\), referred to with the slice notaiton c:d. For example, the slice notation A[c:d,j] refers to the array containing the \(d-c+1\) elements A[i,j] for \(i\) in the interval \([c,d]\).

There is also a nickname end for the last index of arow or column, and you cna do arithmetic on that: end-1 get the penultimate entry; that can be nice for readbility.

Also, for the special case of wanting all elements of a row or column, one can use just : in place of 1:end or 1:n.

Thus, each of the three arithmetic calculations above can be specified over a range of index values in a single command, eliminating all the inner-most for loops; only for loops that contains other for loops remain.

This is called vectorizing, and apart from mathematical elegance, it usually allows far faster execution.

for k in 1:n
    L[k+1:end,end] = A[k+1:end,k] / A[k,k]
    # The indexing below as "[k]" instead of just "k" ensures that these slices"
    # are treated as matrices with 1 row/resp. 1 column, not as vectors;
    # ensures that matrix-matrix multiplication is done correctly.
    A[k+1:end,k+1:end] -= L[k+1:end,[k]] * A[[k],k+1:end]
    b[k+1:end] -= L[k+1:end,k] * b[k]
    end
end

I will break my usual guideline by redefining rowReduce, since this is just a restatement of exactly the same algorithm.

While I am about it, I add a demoMode, for display of intermediate results.

function rowReduce(A, b; demoMode=false)
    """To avoid modifying the matrix and vector specified as input,
    they are copied to new arrays, with the method .copy()
    Warning: it does not work to say "U = A" and "c = b";
    this makes these names synonyms, referring to the same stored data.
    
    This version vectorizes the inner loops, and all of the "i, j" loop for updating U.
    """
    if demoMode
        println("rowReduce version 2: some loops vectorized")
    end
    U = copy(A)  # not "U=A", which makes U and A synonyms
    c = copy(b)
    n = length(b)
    L = zeros(n, n)
    for k in 1:n-1
        if demoMode; println("Step $(k):"); end
        # compute all the L values for column k:
        L[k+1:end,k] = U[k+1:end,k] / U[k,k]  # Beware the case where U[k,k] is 0
        U[k+1:end,k+1:end] -= L[k+1:end,[k]] * U[[k],k+1:end]
        c[k+1:end] -= L[k+1:end,k] * c[k]
        
        # Insert the below-diagonal zeros in column k;
        # this is not important for calculations, since those elements of U are not used in backward substitution,
        # but it helps for displaying results and for checking the results via residuals.
        U[k+1:end,k] .= 0.0

        if demoMode
            println("After step $(k) the matrix is")
            print_matrix(U)
            println("and the right-hand side is $(c)")
        end
    end
     return (U, c)
end;

Repeating the above testing:

(U, c) = rowReduce(A, b, demoMode=true);
#println("\nRow reduction gives")
println("Row reduction gives U=")
print_matrix(U)
println("and right-hand side $(c)")
rowReduce version 2: some loops vectorized
Step 1:
After step 1 the matrix is
[
4.0 2.0 7.0 
0.0 3.5 -11.25 
0.0 -3.5 0.25 
]
and the right-hand side is [2.0, 1.5, 3.5]
Step 2:
After step 2 the matrix is
[
4.0 2.0 7.0 
0.0 3.5 -11.25 
0.0 0.0 -11.0 
]
and the right-hand side is [2.0, 1.5, 5.0]
Row reduction gives U=
[
4.0 2.0 7.0 
0.0 3.5 -11.25 
0.0 0.0 -11.0 
]
and right-hand side [2.0, 1.5, 5.0]

This procedure is backward substitution, giving the algorithm

\(x_n = c_n/u_{n,n}\)
for i from n-1 down to 1
\(\displaystyle \quad x_i = \frac{c_i - \sum_{j=i+1}^{n} u_{i,j} x_j}{u_{i,i}}\)
end

This works so long as none of the main diagonal terms \(u_{i,i}\) is zero, because when done in this order, everything on the right hand side is known by the time it is evaluated.

For future reference, note that the elements \(u_{k,k}\) that must be non-zero here, the ones on the main diagonal of \(U\), are the same as the elements \(a_{k,k}^{(k)}\) that must be non-zero in the row reduction stage above, because after stage \(k\), the elements of row \(k\) do not change any more: \(a_{k,k}^{(k)} = a_{k,k}^{(n-1)} = u_{k,k}\).

8.6.2. Julia Note 2. Indexing from the end of an array and counting backwards#

To express the above backwards counting in Julia, we have to deal with the fact that a:b counts upwards.

This is dealt with the extended form a:increment:b, which increments by step instead of by one, so that a:1:b is the same as a:b while a:-1:b counts down: \(a, a-1, \dots, b+1, b\).

One more bit of Julia: for an \(n\)-element single-index array v, the sum of its elements \(\sum_{i=1}^{n} v_i\) is given by sum(v). Thus \(\sum_{i=a}^{b} v_i\), the sum over a subset of indices \([a,b]\), is given by sum(v[a:b]).

8.6.3. The backward substitution algorithm in Julia#

With all the above Julia details, the core code for backward substitution is:

x[end] = c[end]/U[end,end]
for i in n-1:-1:1
    x[i] = (c[i] - U[i,i+1:end] * x[i+1:end])) / U[i,i]
end

Aside/preview: Note that the backward substitution algorithm and its Python coding have a nice mathematical advantage over the row reduction algorithm above: the precise mathematical statement of the algorithm does not need any intermediate quantities distinguished by superscripts \({}^{(k)}\), and correspondingly, all variables in the code have fixed meanings, rather than changing at each step.

In other words, all uses of the equal sign are mathematically correct as equations!

This can be advantageous in creating algorithms and code that is more understandable and more readily verified to be correct, and is an aspect of the functional programming approach. We will soon go part way to that functional ideal, by rephrasing Gaussian elimination in a form where all variables have clear, fixed meanings, corresponding to the natural mathematical description of the process: the method of LU factorization to be seen in Section 8.1 of the text Numerical Mathematics and Computing.

As a final demonstration, we put this second version of the code into a complete working Python function and test it:

function backwardSubstitution(U, c; demoMode=false)
    n = length(c)
    x = zeros(n)
    x[end] = c[end]/U[end,end]
    if demoMode
        println("x_$(n) = $(x[n])")
    end
    for i in n-1:-1:1
        if demoMode
            println("i=$(i)")
        end
        x[i] = ( c[i] - sum(U[i,i+1:end] .* x[i+1:end]) ) / U[i,i]
        if demoMode
            print("x_$(i) = $(x[i])")
        end
    end
    return x
end;
x = backwardSubstitution(U, c, demoMode=true)
x_3 = -0.45454545454545453
i=2
x_2 = -1.0324675324675323i=1
x_1 = 1.8116883116883116
3-element Vector{Float64}:
  1.8116883116883116
 -1.0324675324675323
 -0.45454545454545453
print("x = $(x)")
r = b - A*x
println("\nThe residual b - Ax = $(r),")
println("    with maximum norm $(maximum(abs.(r)))")
x = [1.8116883116883116, -1.0324675324675323, -0.45454545454545453]
The residual b - Ax = [0.0, 0.0, 8.881784197001252e-16],
    with maximum norm 8.881784197001252e-16

8.7. Two code testing hacks: starting from a known solution, and using randomly generated examples#

An often useful strategy in developing and testing code is to create a test case with a known solution; another is to use random numbers to avoid accidently using a test case that in unusually easy.

The usual preferred style is to have all import and using statements at the top, but since this is the first time we’ve heard of module Random I did not want it to be mentioned mysteriously above.

This Pkg.add command need only be done “once per computer”, to install it in the Julia environment; thus, only uncomment and run this if a subsequent import or using command tells you to do this, and then comment it out again.

#import Pkg; Pkg.add("Random")
using Random: rand!
n = length(b)
x_random = zeros(n)
rand!(x_random)  # fill with random values, uniform in [0, 1)
(x_random *= 2.) .-= 1.  # double and then subtract 1: now uniform in [-1, 1)
println("x_random = $(x_random)")
x_random = [-0.012239104484199403, -0.342909651529389, -0.9676729772538806]

Create a right-hand side b that automatically makes x_random the correct solution:

b_random = A * x_random;
println("A is"); print_matrix(A)
println("b_random is $(b_random)")

(U, c_random) = rowReduce(A, b_random)

println("U is"); print_matrix(U)
println("Residual c_random - U*x_random  = $(c_random - U*x_random)")
x_computed = backwardSubstitution(U, c_random)
println("x_computed is $(x_computed)")
r = b_random -  A*x_computed
println("Residual b_random-A*x_computed is $(r)")
println("Backward error is $(maximum(abs.(r)))")
x_error = x_random - x_computed
println("Error x_random-x_computed is $(x_error)")
println("Absolute error |x_random-x_computed| is $(maximum(abs.(x_error)))")
A is
[
4.0 2.0 7.0 
3.0 5.0 -6.0 
1.0 -3.0 2.0 
]
b_random is [-7.50848656177274, 4.05477229242374, -0.9188561044037935]
U is
[
4.0 2.0 7.0 
0.0 3.5 -11.25 
0.0 0.0 -11.0 
]
Residual c_random - U*x_random  = [0.0, 0.0, 0.0]
x_computed is [-0.012239104484199403, -0.34290965152938924, -0.9676729772538806]
Residual b_random-A*x_computed is [0.0, 8.881784197001252e-16, -6.661338147750939e-16]
Backward error is 8.881784197001252e-16
Error x_random-x_computed is [0.0, 2.220446049250313e-16, 0.0]
Absolute error |x_random-x_computed| is 2.220446049250313e-16

8.8. What can go wrong? Three examples#

8.8.1. Example 1#

8.8.1.1. An obvious division by zero problem#

Consider the system of two equations

\[\begin{split}\begin{split} x_2 &= 1 \\ x_1 + x_2 &= 2 \end{split}\end{split}\]

It is easy to see that this has the solution \(x_1 = x_2 = 1\); in fact it is already in “reduced form”. However when put into matrix form

\[\begin{split} \left[\begin{array}{rr} 0 & 1 \\ 1 & 1 \end{array}\right] \left[\begin{array}{r} x_1 \\ x_2 \end{array}\right] = \left[\begin{array}{r} 1 \\ 2 \end{array}\right] \end{split}\]

the above algorithm fails, because the fist pivot element \(a_{11}\) is zero:

A1 = [0. 1. ; 1. 1.]
b1 = [1. ; 1.]

println("A1 is)");   print_matrix(A1)
println("b1 is $(b1)")
A1 is)
[
0.0 1.0 
1.0 1.0 
]
b1 is [1.0, 1.0]
(U1, c1) = rowReduce(A1, b1)
x1 = backwardSubstitution(U1, c1)

println("U1 is"); print_matrix(U1)
print("c1 is $(c1)")
print("x1 is $(x1)")
U1 is
[
0.0 1.0 
0.0 -Inf 
]
c1 is [1.0, -Inf]x1 is [NaN, NaN]

Julia Note 3.

  • Inf, meaning “infinity”, is a special value given as the result of calculations like division by zero. Surprisingly, it can have a sign!

  • NaN, meaning “Not a Number”, is a special value given as the result of a calculation like 0/0.

8.8.2. Example 2#

8.8.2.1. A less obvious division by zero problem#

Next consider this system

\[\begin{split} \left[\begin{array}{rrr} 1 & 1 & 1 \\ 1 & 1 & 2 \\ 1 & 2 & 2 \end{array}\right] \left[\begin{array}{r} x_1 \\ x_2 \\ x_3 \end{array}\right] = \left[\begin{array}{r} 3 \\ 4 \\ 5 \end{array}\right] \end{split}\]

The solution is \(x_1 = x_2 = x_3 = 1\), and this time none of th diagonal elements is zero, so it is not so obvoius that a divisin be zero probelms will occur, but:

A2 = [1.  1.  1. ; 1.  1.  2. ; 1. 2. 2.]
b2 = [3. ; 4. ; 5.]

println("A2 is"); print_matrix(A2)
println("b2 is $(b2)")
A2 is
[
1.0 1.0 1.0 
1.0 1.0 2.0 
1.0 2.0 2.0 
]
b2 is [3.0, 4.0, 5.0]
(U2, c2) = rowReduce(A2, b2)
x2 = backwardSubstitution(U2, c2)

println("U2 is"); print_matrix(U2)
println("c2 is $(c2)")
println("x2 is $(x2)")
U2 is
[
1.0 1.0 1.0 
0.0 0.0 1.0 
0.0 0.0 -Inf 
]
c2 is [3.0, 1.0, -Inf]
x2 is [NaN, NaN, NaN]

What happens here is that the first stage subtracts the first row from each of the others …

A2[2,:] -= A2[1,:]
b2[2] -= b2[1]
A2[3,:] -= A2[1,:]
b2[3] -= b2[1];

… and the new matrix has the same problem as above at the next stage:

println("Now A2 is"); print_matrix(A2)
println("and b2 is $(b2)")
Now A2 is
[
1.0 1.0 1.0 
0.0 0.0 1.0 
0.0 1.0 1.0 
]
and b2 is [3.0, 1.0, 2.0]

Thus, the second and third equations are

\[\begin{split} \left[\begin{array}{rr} 0 & 1 \\ 1 & 1 \end{array}\right] \left[\begin{array}{r} x_2 \\ x_3 \end{array}\right] = \left[\begin{array}{r} 1 \\ 2 \end{array}\right] \end{split}\]

with the same problem as in Example 1.

8.8.3. Example 3#

8.8.3.1. Problems caused by inexact arithmetic#

The equations

\[\begin{split} \left[\begin{array}{rr} 1 & 10^{16} \\ 1 & 1 \end{array}\right] \left[\begin{array}{r} x_1 \\ x_2 \end{array}\right] = \left[\begin{array}{r} 1+10^{16} \\ 2 \end{array}\right] \end{split}\]

again have the solution \(x_1 = x_2 = 1\), and the only division that happens in the above algorithm for row reduction is by that pivot element \(a_{11} = 1, \neq 0\), so with exact arithmetic, all would be well. But:

A3 = [1.  1e16 ; 1. 1.]
b3 = [1. + 1e16 ; 2.]

println("A3 is"); print_matrix(A3)
println("b3 is $(b3)")
A3 is
[
1.0 1.0e16 
1.0 1.0 
]
b3 is [1.0e16, 2.0]
(U3, c3) = rowReduce(A3, b3)
x3 = backwardSubstitution(U3, c3)

println("U3 is"); print_matrix(U3)
println("c3 is $(c3)")
println("x3 is $(x3)")
U3 is
[
1.0 1.0e16 
0.0 -1.0e16 
]
c3 is [1.0e16, -9.999999999999998e15]
x3 is [2.0, 0.9999999999999998]

This gets \(x_2 = 1\) fairly accurately, but \(x_1\) is completely wrong!

One hint is that \(b_1\), which should be \(1 + 10^{16} = 1000000000000001\), is instead just given as \(10^{16}\).

On the other hand, all is well with less large values, like \(10^{15}\):

A3a = [1.  1e15 ; 1.  1.]
b3a = [1. + 1e15 ; 2.]

println("A3a is"); print_matrix(A3a)
println("b3a is $(b3a)")
A3a is
[
1.0 1.0e15 
1.0 1.0 
]
b3a is [1.000000000000001e15, 2.0]
(U3a, c3a) = rowReduce(A3a, b3a)
x3a = backwardSubstitution(U3a, c3a)

println("U3a is"); print_matrix(U3a)
println("c3a is $(c3a)")
println("x3a is $(x3a)")
U3a is
[
1.0 1.0e15 
0.0 -9.99999999999999e14 
]
c3a is [1.000000000000001e15, -9.99999999999999e14]
x3a is [1.0, 1.0]

8.8.4. Example 4#

8.8.4.1. Avoiding small denominators#

The first equation is Example 3 can be divided by \(10^{16}\) to get an equivalent system with the same problem:

\[\begin{split} \left[\begin{array}{rr} 10^{-16} & 1 \\ 1 & 1 \end{array}\right] \left[\begin{array}{r} x_1 \\ x_2 \end{array}\right] = \left[\begin{array}{r} 1+10^{-16} \\ 2 \end{array}\right] \end{split}\]

Now the problem is more obvious: this system differs from the system in Example 1 just by a tiny change of \(10^{-16}\) in that pivot elements \(a_{11}\), and the problem is division by a value very close to zero.

A4 = [1e-16  1. ; 1. 1.]
b4 = [1. + 1e-16 ; 2.]

println("A4 is"); print_matrix(A4)
println("b4 is $(b4)")
A4 is
[
1.0e-16 1.0 
1.0 1.0 
]
b4 is [1.0, 2.0]
(U4, c4) = rowReduce(A4, b4)
x4 = backwardSubstitution(U4, c4)

println("U4 is"); print_matrix(U4)
println("c4 is $(c4)")
println("x4 is $(x4)")
U4 is
[
1.0e-16 1.0 
0.0 -1.0e16 
]
c4 is [1.0, -9.999999999999998e15]
x4 is [2.220446049250313, 0.9999999999999998]

One might think that there is no such small denominator in Example 3, but what counts for being “small” is magnitude relative to other values — 1 is very small compared to \(10^{16}\).

To understand these problems more (and how to avoid them) it is time to explore Machine Numbers, Rounding Error and Error Propagation.


This work is licensed under Creative Commons Attribution-ShareAlike 4.0 International