2.7. Operation Counts for Solving Simultaneous Linear Equations#
Last revised on October 10, 2025, adding a forward reference to the operation count exercises in the section on Faster Methods for Solving Ax=b with Tridiagonal and Banded Matrices.
References:
[Sauer, 2022] Section 2.1.2, Operation Counts.
[Burden et al., 2016] Sections 6.1, Linear Systems of Equations, and 6.5, Matrix Factorization.
[Chasnov, 2012] Section 3.4, Operation counts.
[Chenney and Kincaid, 2013] Section 7.2, in the subsection on Long Operation Count.
The subsection Operation Counts of [Kincaid and Chenney, 1990] Section 4.3, Pivoting and Constructing an Algorithm.
[Trefethen and Bau, 2022] Lecture 20, Gaussian Elimination.
[Dahlquist and Björck, 2009] Section 7.2, Gaussian Elimination: see the comments on flops.
All the algorithms seen so far involve a fixed sequence of arithmetic operations, so we can measure their speed (or “cost”) by counting these.
I will (at least for now) ignore the speed-up that is often possible by doing multiple operations simultaneously (for example by using the multiple cores of almost every curent CPU or GPU), with the one exception of accounting for the fused multipy-add. The algorithms here have loops where at each iteration, the values of two variables are multiplied and then the result is added to (or subtracted from) a third, and when such a succession of these are done, most or all modern processors can effectively do this in the time taken by the multiplication alone, by doing each add/subtract while also doing the next multiplication.
So in effect, we only need to count the multiplications and divisions.
Further, as we will see soon, multiplications often dominate the total effort, so that we can then ignore the divisions.
One final simplification: the highest power of \(n\) in the count is usually enough information; for example, backward substitution requires \(n(n-1)/2 = n^2/2 - n/2\) multiplications (and an equal number of subtractions), but it is enough to describe this as “approximately \(n^2/2\)”. For this we introduce some new notation:
Definition 2.11 (A refined “big-O” notation, keeping track of constant factors.)
When dealing with limits as \(n \to \infty\) rather than as \(h \to 0\), it will be useful to have a refined version of the “big-O” notation, keeping track of constant factors.
So from now on, the meaning of “\(f(n) = O(g(n))\) for large \(n\)” is that
not just that the ratio is bounded. Thus for example we will say that
Theorem 2.12
For a system of \(n\) equations in \(n\) unknowns, the operation counts are:
For computing the LU factorization: \(O(n^3/3)\) multiplications and \(O(n^2/2)\) divisions.
For forward substitution: \(O(n^2/2)\) multiplications, no divisions.
For backward substitution: \(O(n^2/2)\) multiplications, \(n\) divisions.
For row reduction from \(Ax = b\) to \(Ux = c\): \(O(n^3/3)\) multiplications and \(O(n^2/2)\) divisions.
For solving \(Ax = b\) by either method: \(O(n^3/3)\) multiplications and \(O(n^2/2)\) divisions.
Proof. The verifications of these will use some basic counting results, which will not be verified here; you have perhaps seen them in a calculus course, in the introduction to Riemann sums:
Lemma 2.1
Note how the dominant “big-O” terms match the integrals
Start with the easiest of these five items, forward substitution as in Algorithm 2.8, which can be rewritten as
Algorithm 2.14
for i from 1 to n
\(\displaystyle \quad c_i = b_i - \sum_{j=1}^{i-1} l_{i,j} c_j\)
end
with the case \(i=1\) being just \(c_1 = b_1\); no arithmetic operations to count.
The recurring strategy is to first do the operation counts for calculations inside a loop, and then sum the results over the loop’s iterations. Here the calculation
involves \(i-1\) multiplications (and in fact the same number of FMA’s)
and then the cost of the for i from 1 to n loop is the sum of these:
Next, backward substitution as in Algorithm 2.5 is very similar; rather than doing all the counting again, note that the main difference from forward substitution is the division at each step, adding those \(n\) divisions.
The last hard work is the cost of LU factorization itself, as in Algorithm 2.7; it is again convenient to rewite the algorithm with a redundant step \(k=1\) in the main loop:
Algorithm 2.15 (Doolittle factorization)
for k from 1 to n
\(\quad\) for j from k to n
\(\quad\quad\) \(u_{k,j}=a_{k,j} - \sum_{s=1}^{k-1}l_{k,s}u_{s,j}\)
\(\quad\) end
\(\quad\) for i from k+1 to n
\(\quad\quad\) \(l_{i,k}=\displaystyle\frac{a_{i,k}-\sum_{s=1}^{k-1}l_{i,s}u_{s,k}}{u_{k,k}}\)
\(\quad\) end
end
Inside the loop for j from k to n, the lines
and
each do \(k-1\) multiplications, and the latter also does one division, so in total \(2(k-1)\) multiplications, one division.
Then the loop for j from k to n has these same “costs” at each of its \(n-k+1\) iterations, giving
\(2(k-1)(n-k+1)\) multiplications, \(n-k+1\) divisions.
Finally, the loop for k from 1 to n sums these, giving
multiplications, and
divisions, all as in item (1) above.
The remaining results (4) and (5) follow quickly from these:
we have seen that row reduction combines exactly the arithmetic of LU factorization plus forward substitution from items (1) and (2) (just in a diferent order), and
completing the solution by either method then just adds backward substitution, whose costs are seen in item (3) to be “negligible” compared to those for the LU factorization or row reduction already done.
2.7.1. Exercises#
There are exercises on operation counts in the algorithms for some special cases in the section on tridiagonal and banded matrices.