11. Solving \(Ax = b\) with LU factorization, \(A = L U\)#

References:

11.1. Avoiding repeated calculation, excessive rounding and messy notation: LU factorization#

Putting aside pivoting for a while, there is another direction in which the algorithm for solving linear systems \(Ax=b\) can be improved. It starts with the idea of being more efficient when solving multiple system with the same right-hand side: \(A x^{(m)} = b^{(m)}, m=1,2, \dots\).

However it has several other benefits:

  • allowing a strategy to reduce rounding error, and

  • a simpler, more elegant mathematical statement.

We will see how to merge this with partial pivoting in Solving Ax = b With Both Pivoting and LU Factorization

Some useful jargon:

Definition: triangular matrices A matrix is triangular if all its non-zero elements are either on the main diagonal or to one side of it. There are two possibilities:

  • Matrix \(U\) is upper triangular if \(u_{ij} = 0\) for all \(i > j\).

  • Matrix \(L\) is lower triangular if \(l_{ij} = 0\) for all \(j > i\).

One important example of an upper triangular matrix is \(U\) formed by row reduction; note well that it is much quicker and easier to solve \(Ux = c\) than the original system \(Ax=b\) exactly because of its triangular form.

We will soon see that the multipliers \(l_{ij}\), \(i > j\) for row reduction that were introduced in Solving Simultaneous Linear Equations, Part 1: Row Reduction/Gaussian Elimination help to form a lower triangular matrix \(L\).

The key to the LU factorization idea is finding a lower triangular matrix \(L\) and an upper triangular matrix \(U\) such that \(L U = A\), and then using the fact that it is far quicker to solve a linear system when the corresponding matrix is triangular.

Indeed we will see that, if naive Gaussian elimination for \(Ax=b\) succeeds, giving row-reduced form \(Ux = c\):

  1. The matrix \(A\) can be factorized as \(A = LU\) with \(U\) an \(n \times n\) upper triangular matrix and \(L\) an \(n \times n\) lower triangular matrix.

  2. There is a unique such factorization with the further condition that \(L\) is unit lower triangular, which means the extra requirement that the value on its main daigonal are unity: \(l_{k,k} = 1\). This is called the Doolittle Factorization of \(A\).

  3. In the Doolittle factorization, the matrix \(U\) is the one given by naive Gaussian elimination, and the elements of \(L\) below its main diagonal are the multipliers arising in naive Gaussian elimination. (The other elements of \(L\), on and above the main diagonal, are the ones and zeros dictated by it being unit lower triangular: the same as for those elements in the \(n \times n\) identity matrix.)

  4. The transformed right-hand side \(c\) arising from naive Gaussian elimination is the solution of the system \(Lc = b\), and this is solvable by an procedure caled forward substitution, very similar to the backward subsitution used to solve \(Ux = c\).

Putting all this together: if naive Gaussian elimination works for \(A\), we can introduce the name \(c\) for \(Ux\), and note that \(Ax = (LU)x = L(Ux) = Lc = b\). Then solving of the system \(Ax = b\) can be done in three steps:

  1. Using \(A\), find the Doolittle factors, \(L\) and \(U\).

  2. Using \(L\) and \(b\), solve \(Lc = b\) to get \(c\). (Forward substitution)

  3. Using \(U\) and \(c\), solve \(Ux = c\) to get \(x\). (Backward substitution)

11.2. The direct method for the Doolittle LU factorization#

If you believe the above claims, we already have one algorithm for finding an LU factorization; basically, do naive Gaussian elimination, but ignore the right-hand side \(b\) until later. However, there is another “direct” method, which does not rely on anything we have seen before about Gaussian elimination, and has other advantages as we will see.

(If I were teaching linear algebra, I would be tempted to start here and skip Gaussian Elimination!)

This method starts by considering the apparently daunting task of solving the \(n^2\) simultaneous and nonlinear equations for the initially unknown elements of \(L\) and \(U\):

\[ \sum_{k=1}^n l_{i,k} u_{k,j} = a_{i,j}\; 1 \leq i \leq n,\;1 \leq j \leq n. \]

The first step is to insert the known information; the already-known values of elements of \(L\) and \(U\). For one thing, the sums above stop when either \(k=i\) or \(k=j\), whichever comes first, due to all the zeros in \(L\) nd \(U\):

\[ \sum_{k=1}^{\min(i,j)} l_{i,k} u_{k,j} = a_{i,j}\; 1 \leq i \leq n,\;1 \leq j \leq n. \]

Next, when \(i \leq j\)— so that the sum ends at \(k=i\) and involves \(l_{i,i}\) — we can use \(l_{i,i} = 1\).

So break up into two cases:

On and above the main diagonal (\(i \leq j\), so \(\min(i,j) = i\)):

\[\sum_{k=1}^{i-1}l_{i,k} u_{k,j}+u_{i,j}=a_{i,j} \quad 1\leq i \leq n,\; i \leq j \leq n.\]

Below the main diagonal (\(i > j\), so \(\min(i,j) = j\)):

\[\sum_{k=1}^{j-1} l_{i,k} u_{k,j} + l_{i,j} u_{j,j}= a_{i,j} \quad 2 \leq i \leq n,\;1 \leq j \leq i.\]

In each equation, the last term in the sum has been separated, so that we can use them to “solve” for an unknown:

\[\begin{split} \begin{split} u_{i,j} &= a_{i,j} - \sum_{k=1}^{i-1} l_{i,k} u_{k,j} \quad 1 \leq i \leq n,\;i \leq j \leq n. \\ l_{i,j} &= \frac{a_{i,j} - \sum_{k=1}^{j-1} l_{i,k} u_{k,j}}{u_{j,j}} \quad 2 \leq i \leq n,\;1 \leq j \leq i. \end{split} \end{split}\]

Here comes the characteristic step that gets us from valid equations to a useful algorithm: we can arrange these equations in an order such that all the values at right are determined by an earlier equation!

First look at what they say for the first row and first column.

With \(i=1\) in the first equation, there is no sum, and so: $\( u_{1,j}=a_{1,j}, \quad 1 \leq j \leq n, \)$ which is the familiar fact that the first row is unchanged in naive Gaussian elimination.

Next, with \(j=1\) in the second equation, there is again no sum: $\( l_{i,1} = \frac{a_{i,1}}{u_{1,1}}, = \frac{u_{i,1}}{u_{1,1}}, \quad 2 \leq i \leq n, \)$ which is indeed the multipliers in the first step of naive Gaussian elimination.

Remember that one way to think of Gaussian elimination is recursively: after step \(k\), one just applies the same process recursively to the smaller \(n-k \times n-k\) matrix in the bottom-right-hand corner. We can do something similar here; at stage \(k\):

  1. First use the first of the above equations to solve first for row \(k\) of \(U\), meaning just \(u_{k,j}, j \geq k\),

  2. Then use the second equation to solve for column \(k\) of \(L\): \(l_{i,k}, i > k\).

Stage \(k=1\) is handled by the simpler special equations above, so for the rest:

for k from 2 to n
\(\quad\) for j from k to n \(\qquad\) Get the non-zero elements in row \(k\) of \(U\)
\(\quad\quad\) \(u_{k,j}=a_{k,j} - \sum_{s=1}^{k-1}l_{k,s}u_{s,j}\)
\(\quad\) end
\(\quad\) for i from k+1 to n \(\qquad\) Get the non-zero elements in column \(k\) of \(L\) (except the 1’s on its diagonal)
\(\quad\quad\) \(l_{i,k}=\displaystyle\frac{a_{i,k}-\sum_{s=1}^{k-1}l_{i,s}u_{s,k}}{u_{k,k}}\)
\(\quad\) end
end

Note well that in the formulas to evaluate at the right,

  1. The terms \(l_{k,s}\) are for \(s < k\), so from a column \(s\) that has already been computed for a previous \(k\) value.

  2. The terms \(u_{s,j}\) are for \(s < k\), so from a row \(s\) that has already been computed for a previous \(k\) value.

  3. The denominator \(u_{k,k}\) in the second inner loop is computed just in time, in the first inner loop for the same \(k\) value.

So the only thing that can go wrong is the same as before: a zero pivot element \(u_{k,k}\).

Notes

  1. For \(k=n\), the second inner loop is redundant, so could be eliminated. Indeed it might need to be eliminated in actual code, where “empty loops” might not be allowed. On the other hand, allowing empty loops makes the above correct also for \(k=1\); then the for k loop encompases the entire factorization algorithm.

  2. This direct factorization algorithm avoids any intermediate modification of arrays, and thus eliminates all those superscripts like \(a_{i,j}^{(k)}\). This is not only nicer mathematically, but can help to avoid mistakes like code that inadvertently modifies the array containing the matrix \(A\) and then uses it to compute the residual, \(Ax - b\). More generally, such purely mathematical statements of algorithms can help to avoid coding errors; this is part of the philosophy of the functional programming approach.

  3. Careful examination shows that the product \(l_{k,s} u_{s,j}\) that is part of what is substracted at location \((k,j)\) is the same as what is subtracted there at stage \(k\) of Gaussian elimination, just with different names. More generally, every piece of arithmetic is the same as before, except arranged in a different order, so that the \(k-1\) changes made to an element in row \(k\) are done together, via those sums.

  4. Rewrites with zero-based indexing will be provided later.

# Tell Julia where I keep my modules; for now in a "sister" folder to where the notebooks are:
push!(LOAD_PATH, "../modules");
import NumericalMethodsAndAnalysis: print_matrix
function lu_factorize(A; demoMode=false)
    # Compute the Doolittle LU factorization of A.
    # Sums like $\sum_{s=1}^{k-1} l_{k,s} u_{s,j}$ are done as matrix products;
    # in the above case, row matrix L[k, 1:k-1] by column matrix U[1:k-1,j] gives the sum for a give j,
    # and row matrix L[k, 1:k-1] by matrix U[1:k-1,k:n] gives the relevant row vector.
    n = size(A)[1]  # First component of the array's size; size(A) returns "(rows, columns)"
    # Initialize U as a zero matrix;
    # correct below the main diagonal, with the other entries to be computed and filled below.
    U = zeros(n,n)
    # Initialize L as a zero matrix;
    # correct above the main diagonal, with the other entries to be computed and filled in below.
    L = zeros(n,n)
    # Column and row 1 are special:
    U[1,:] = A[1,:]
    L[1,1] = 1.
    L[2:end,1] = A[2:end,1]/U[1,1]
    if demoMode
        println("After step k=1")
        println("U="); print_matrix(U)
        println("L="); print_matrix(L)
    end;
    for k in 2:n-1
        # Julia note: it is necessary to use indices "[k]" and so on to get a one-row matrix instead of a vector.
        U[[k],k:end] = A[[k],k:end] - L[[k],1:k] * U[1:k,k:end]
        L[k,k] = 1.
        L[k+1:end,k] = ( A[k+1:end,k] - L[k+1:end,1:k] * U[1:k,k] )/U[k,k]
        if demoMode
            println("After step k=$(k)")
            println("U="); print_matrix(U)
            println("L="); print_matrix(L)
        end;
    end;
    # The last row is also special: nothing to do for L
    L[end,end] = 1.
    U[end,end] = A[end,end] - sum(L[[n],1:end-1] * U[1:end-1,end])
    if demoMode
        println("After step k=$(n)")
        println("U="); print_matrix(U)
    end;
    return [L, U]
end;

11.2.1. A test case on LU factorization#

It will be useful to compute matrix norms as a mesnue or error; in particular the “maximum” or “infinity” norm of v is given by norm(v, Inf)

import LinearAlgebra: norm
A = [4. 2 7; 3 5 -6; 1 -3 2];
print_matrix(A)
[
4.0 2.0 7.0 
3.0 5.0 -6.0 
1.0 -3.0 2.0 
]
(L, U) = lu_factorize(A, demoMode=true);
After step k=1
U=
[
4.0 2.0 7.0 
0.0 0.0 0.0 
0.0 0.0 0.0 
]
L=
[
1.0 0.0 0.0 
0.75 0.0 0.0 
0.25 0.0 0.0 
]
After step k=2
U=
[
4.0 2.0 7.0 
0.0 3.5 -11.25 
0.0 0.0 0.0 
]
L=
[
1.0 0.0 0.0 
0.75 1.0 0.0 
0.25 -1.0 0.0 
]
After step k=3
U=
[
4.0 2.0 7.0 
0.0 3.5 -11.25 
0.0 0.0 -11.0 
]
println("A is"); print_matrix(A)
println("L is"); print_matrix(L)
println("U is"); print_matrix(U)
println("L times U is"); print_matrix(L*U)
println("The 'residual' or 'backward error' A - LU is"); print_matrix(A - L*U)
A is
[
4.0 2.0 7.0 
3.0 5.0 -6.0 
1.0 -3.0 2.0 
]
L is
[
1.0 0.0 0.0 
0.75 1.0 0.0 
0.25 -1.0 1.0 
]
U is
[
4.0 2.0 7.0 
0.0 3.5 -11.25 
0.0 0.0 -11.0 
]
L times U is
[
4.0 2.0 7.0 
3.0 5.0 -6.0 
1.0 -3.0 2.0 
]
The 'residual' or 'backward error' A - LU is
[
0.0 0.0 0.0 
0.0 0.0 0.0 
0.0 0.0 0.0 
]

11.2.2. Forward substitution: solving \(Lc = b\) for \(c\)#

This is the last piece missing. The strategy is very similar to backward substitution, but slightly simplified by the ones on the main didogonal of \(L\). The equations \(L c = b\) can be written much as above, separating off the last term in the sum:

\[ \sum_{j=1}^{n} l_{i,j} c_j = b_i,\; 1 \leq i \leq n \]
\[ \sum_{j=1}^{i} l_{i,j} c_j = b_i,\; 1 \leq i \leq n \]
\[ \sum_{j=1}^{i-1} l_{i,j} c_j + c_i = b_i,\; 1 \leq i \leq n \]

Then solve for \(c_i\):

\[ c_i = b_i - \sum_{j=1}^{i-1} l_{i,j} c_j \]

These are already is usable order: the right-hand side in the equation for \(c_i\) involves only the \(c_j\) values with \(j < i\), determined by earlier equations if we run through index \(i\) in increasing order.

First, \(i=1\)

\[ c_1 = b_1 - \sum_{j=1}^{0} l_{1,j} c_j, = b_1 \]

Next, \(i=2\)

\[ c_2 = b_2 - \sum_{j=1}^{1} l_{2,j} c_j, = b_2 - l_{2,1}c_1 \]

Next, \(i=3\)

\[ c_3 = b_3 - \sum_{j=1}^{2} l_{3,j} c_j, = b_3 - l_{3,1} c_1 - l_{3,2} c_2 \]

I leave as an exerise expressing this is pseudo-code; here it is in Julia.

As usual, it is also available from Module NumericalMethodsAndAnalysis, using

import NumericalMethodsAndAnalysis: forwardSubstitution
function forwardSubstitution(L, b)
    #Solve L c = b for c.
    n = length(b)
    c = zeros(n)
    c[1] = b[1]
    for i in 2:n
        c[i] = b[i] - sum(L[[i],1:i] * c[1:i])
    end;
    return c
end;

11.2.3. A test case on forward substitution#

b = [2., 3., 4.];
c = forwardSubstitution(L, b)
print(c)
[2.0, 1.5, 5.0]
println("c = $(c)")
println("The residual b - Lc is $(b - L*c)")
println("\t with maximum norm $(norm(b - L*c,  Inf))")
c = [2.0, 1.5, 5.0]
The residual b - Lc is [0.0, 0.0, 0.0]
	 with maximum norm 0.0

11.2.4. Completing the test case, with backward substitution#

As this step is unchanged, we could just import the version seen in a previous section, but for now:

function backwardSubstitution(U, c; demoMode=false)
    n = length(c)
    x = zeros(n)
    x[end] = c[end]/U[end,end]
    if demoMode
        println("x_$(n) = $(x[n])")
    end
    for i in n-1:-1:1
        if demoMode
            println("i=$(i)")
        end
        x[i] = ( c[i] - sum(U[i,i+1:end] .* x[i+1:end]) ) / U[i,i]
        if demoMode
            print("x_$(i) = $(x[i])")
        end
    end
    return x
end;
x = backwardSubstitution(U, c);
rcUx = c - U*x
println("The residual c - Ux for the backward substitution step is $(rcUx)")
println("\t with maximum norm $(norm(rcUx, Inf))")
rbAx = b - A*x
println("The residual b - Ax for the whole solving process is $(rbAx)")
println("\t with maximum norm $(norm(rbAx, Inf))")
The residual c - Ux for the backward substitution step is [0.0, -2.220446049250313e-16, 0.0]
	 with maximum norm 2.220446049250313e-16
The residual b - Ax for the whole solving process is [0.0, 0.0, 8.881784197001252e-16]
	 with maximum norm 8.881784197001252e-16

11.2.5. Exercise 1#

A) Express this forward substitution strategy as pseudo-code.

Then implement it “directly” in a Julia function, with format:

function forwardSubstitution(L, b)
    . . .
    return c

Do this “directly” with explcit evaluation of the sums \(\sum_{j=1}^{i-1} l_{i,j} c_j\), rather than using the matrix multiplication short-cut above.

B) Test it, using this often-useful “reverse-engineering” tactic:

  1. Create suitable test arrays L and c. Use \(n\) at least three, and preferably larger.

  2. Compute their product, with b = L * c

  3. Check if c_solution = forwardSubstitution(L, b) gives the correct value.

11.2.5.1. Creating and using modules#

A Julia module is simply a file of Julia code, with the standard suffix “.jl”; one of many tools for creating and editing these is Spyder.

11.2.6. Exercise 2#

(An ongoing activity.)

Start building a Julia module — I suggest the name NumericalMethods — in a file name by adding suffix “.jl” to the module name (e.g. NumericalMethods.jl). put all our linear algebra functions there; for now, forwardSubstitution(L, b), backwardSubstitution(U, c) and rowReduce(A, b), from above and Solving Simultaneous Linear Equations, Part 1: Row Reduction/Gaussian Elimination The syntax of the module file is like this:

module NumericalMethods
export rowReduce, forwardSubstitution, backwardSubstitution
function rowReduce(A, b)
    ...
end;
function forwardSubstitution(L, b)
    ...
end;
function backwardSubstitution(U, c)
    ...
end;
end

We will add the Doolittle method and such in a while, and could use this module in assignments and projects.

As an example of creating and using a module, I am creating one for this course, NumericalMethodsAndAnalysis.jl; see Module NumericalMethodsAndAnalysis.

# Tell Julia where I keep my modules; for now in a "sister" folder to where the notebooks are:
push!(LOAD_PATH, "../modules");
using NumericalMethodsAndAnalysis;
c = simple_sum(3, 4)
7

11.3. When is it safe to ignore pivoting? Quite often, fortunately.#

Fortunately there are many important cases where we can guarantee that naiave Gaussian elimination will work, and so the LU factorizaton method will too. At times it can be known in advance that the maximal element pivoting seen in Solving Simultaneous Linear Equations, Part 2: Partial Pivoting will never in fact swap rows!

In particular, when solving differential equations, it is often very beneficial to avoid pivoting, and we can often choose an approach that allows this.


This work is licensed under Creative Commons Attribution-ShareAlike 4.0 International