Operation Counts for Solving Simultaneous Linear Equations

Contents

2.7. Operation Counts for Solving Simultaneous Linear Equations#

Last revised on October 10, 2025, adding a forward reference to the operation count exercises in the section on Faster Methods for Solving Ax=b with Tridiagonal and Banded Matrices.

References:

All the algorithms seen so far involve a fixed sequence of arithmetic operations, so we can measure their speed (or “cost”) by counting these.

I will (at least for now) ignore the speed-up that is often possible by doing multiple operations simultaneously (for example by using the multiple cores of almost every curent CPU or GPU), with the one exception of accounting for the fused multipy-add. The algorithms here have loops where at each iteration, the values of two variables are multiplied and then the result is added to (or subtracted from) a third, and when such a succession of these are done, most or all modern processors can effectively do this in the time taken by the multiplication alone, by doing each add/subtract while also doing the next multiplication.

So in effect, we only need to count the multiplications and divisions.

Further, as we will see soon, multiplications often dominate the total effort, so that we can then ignore the divisions.

One final simplification: the highest power of \(n\) in the count is usually enough information; for example, backward substitution requires \(n(n-1)/2 = n^2/2 - n/2\) multiplications (and an equal number of subtractions), but it is enough to describe this as “approximately \(n^2/2\)”. For this we introduce some new notation:

Definition 2.11 (A refined “big-O” notation, keeping track of constant factors.)

When dealing with limits as \(n \to \infty\) rather than as \(h \to 0\), it will be useful to have a refined version of the “big-O” notation, keeping track of constant factors.

So from now on, the meaning of “\(f(n) = O(g(n))\) for large \(n\)” is that

\[\frac{f(n)}{g(n)} \to 1 \text{ as } n \to \infty,\]

not just that the ratio is bounded. Thus for example we will say that

\[\frac{n(n-1)}{2} = O(n^2/2)\]

Theorem 2.12

For a system of \(n\) equations in \(n\) unknowns, the operation counts are:

  1. For computing the LU factorization: \(O(n^3/3)\) multiplications and \(O(n^2/2)\) divisions.

  2. For forward substitution: \(O(n^2/2)\) multiplications, no divisions.

  3. For backward substitution: \(O(n^2/2)\) multiplications, \(n\) divisions.

  4. For row reduction from \(Ax = b\) to \(Ux = c\): \(O(n^3/3)\) multiplications and \(O(n^2/2)\) divisions.

  5. For solving \(Ax = b\) by either method: \(O(n^3/3)\) multiplications and \(O(n^2/2)\) divisions.

Proof. The verifications of these will use some basic counting results, which will not be verified here; you have perhaps seen them in a calculus course, in the introduction to Riemann sums:

Lemma 2.1

\[\begin{split} \begin{split} \sum_{i=1}^n 1 &= n \\ \sum_{i=1}^n i &= \frac{n(n+1)}{2}, = O(n^2/2) \\ \sum_{i=1}^n i^2 &= \frac{n(n+1)(2n+1)}{6}= O(n^3/3) \end{split} \end{split}\]

Note how the dominant “big-O” terms match the integrals

\[ \int_0^n 1\ dx = n, \quad \int_0^n x\ dx = n^2/2, \quad \int_0^n x^2\ dx = n^3/3. \]

Start with the easiest of these five items, forward substitution as in Algorithm 2.8, which can be rewritten as

Algorithm 2.14

for i from 1 to n
\(\displaystyle \quad c_i = b_i - \sum_{j=1}^{i-1} l_{i,j} c_j\)
end

with the case \(i=1\) being just \(c_1 = b_1\); no arithmetic operations to count.

The recurring strategy is to first do the operation counts for calculations inside a loop, and then sum the results over the loop’s iterations. Here the calculation

\[c_i = b_i - \sum_{j=1}^{i-1} l_{i,j} c_j\]

involves \(i-1\) multiplications (and in fact the same number of FMA’s) and then the cost of the for i from 1 to n loop is the sum of these:

\[ \sum_{i=1}^n (i-1) = \sum_{i=1}^n i - \sum_{i=1}^n 1 = \frac{n(n+1)}{2} - n = \frac{n(n-1)}{2}, = O(n^2/2). \]

Next, backward substitution as in Algorithm 2.5 is very similar; rather than doing all the counting again, note that the main difference from forward substitution is the division at each step, adding those \(n\) divisions.

The last hard work is the cost of LU factorization itself, as in Algorithm 2.7; it is again convenient to rewite the algorithm with a redundant step \(k=1\) in the main loop:

Algorithm 2.15 (Doolittle factorization)

for k from 1 to n
\(\quad\) for j from k to n
\(\quad\quad\) \(u_{k,j}=a_{k,j} - \sum_{s=1}^{k-1}l_{k,s}u_{s,j}\)
\(\quad\) end
\(\quad\) for i from k+1 to n
\(\quad\quad\) \(l_{i,k}=\displaystyle\frac{a_{i,k}-\sum_{s=1}^{k-1}l_{i,s}u_{s,k}}{u_{k,k}}\)
\(\quad\) end
end

Inside the loop for j from k to n, the lines

\[u_{k,j}=a_{k,j} - \sum_{s=1}^{k-1}l_{k,s}u_{s,j}\]

and

\[l_{i,k}=\displaystyle\frac{a_{i,k}-\sum_{s=1}^{k-1}l_{i,s}u_{s,k}}{u_{k,k}}\]

each do \(k-1\) multiplications, and the latter also does one division, so in total \(2(k-1)\) multiplications, one division.

Then the loop for j from k to n has these same “costs” at each of its \(n-k+1\) iterations, giving \(2(k-1)(n-k+1)\) multiplications, \(n-k+1\) divisions.

Finally, the loop for k from 1 to n sums these, giving

\[\begin{split}\begin{split} \sum_{k=1}^n 2(k-1)(n-k+1) &= 2\sum_{k=1}^n (kn -k^2 + k -n + k = 1) \\&= 2\left[ (n+1) \sum_{k=1}^n k - \sum_{k=1}^n k^2 +2 \sum_{k=1}^n k - \sum_{k=1}^n 1 \right] \\&= 2\left[ (n+1)(n^2/2) - O(n^3/3) +2 O(n^2/2) - n \right] \\&= 2\left[ O(n^3/2) - O(n^3/3) \right] \\&= O(n^3/3) \end{split}\end{split}\]

multiplications, and

\[\sum_{k=1}^n (n-k+1) = (n+1) \sum_{k=1}^n 1 - \sum_{k=1}^n k = (n+1) n - O(n^2/2) = O(n^2/2)\]

divisions, all as in item (1) above.

The remaining results (4) and (5) follow quickly from these:

  • we have seen that row reduction combines exactly the arithmetic of LU factorization plus forward substitution from items (1) and (2) (just in a diferent order), and

  • completing the solution by either method then just adds backward substitution, whose costs are seen in item (3) to be “negligible” compared to those for the LU factorization or row reduction already done.

2.7.1. Exercises#

There are exercises on operation counts in the algorithms for some special cases in the section on tridiagonal and banded matrices.