2.10. Computing Eigenvalues and Eigenvectors: the Power Method and Beyond#

Last revised on September 30, 2025, most recently adding the proof of the Gershgorin Circle Theorem Theorem 2.17 and the Rayleigh Iteration method for computing all eigenvalues. (A few typos were corrected on October 2.)

References:

import numpy as np
import numpy.linalg as la
import numpy.random as random
from numpy import inf

The eigenproblem for a square \(n \times n\) matrix \(A\) is to compute some or all non-trivial solutions of

\[A \vec{v} = \lambda \vec{v}.\]

(By non-trivial, I mean to exclude \(\vec{v} = 0\), which gives a solution for any \(\lambda\).) That is, to compute the eigenvalues \(\lambda\) (of which generically there are \(n\), but sometimes less) and the eigenvectors \(\vec{v}\) corresponding to each.

With eigenproblems, and particularly those arising from differential equations, one often needs only the few smallest and/or largest eigenvalues. For these, we introduce the power method and inverse power method respectively; the latter can then be adapted for computing other eigenvalues, leading to the shifted inverse power method and a variant of that, the Rayleigh iteration method.

Here we sometimes restict our attention to the case of a real symmetric matrix (\(A^T = A\), or \(A_{ij} = A_{ji}\)), or a Hermitian matrix (\(A^T = A^*\)), for which many things are a bit simpler:

  • all eigenvalues are real,

  • for real symmetric matrices, all eigenvectors are also real,

  • there is a complete set of orthogonal eigenvectors \(\vec{v}_k\), \(1 \leq i \leq n\) that form a basis for all vectors,

and so on.

However, the methods described here can be used more generally, or can be made to work with minor adjustments.

The eigenvalues are roots of the characteristic polynomial, \(p(\lambda) = \det(\lambda I - A) = (\lambda - \lambda_1)(\lambda - \lambda_2)\cdots(\lambda - \lambda_n) \). repeated roots are possible, and they will all be enumerated as in this factorization, so there are always \(n\) quantities \(\lambda_i\). Here, these eigenvalues will be enumerated in decreasing order of magnitude:

\[|\lambda_1| \geq |\lambda_2| \cdots \geq |\lambda_n|.\]

Generically, all the magnitudes are different, which makes things work more easily, so that will sometimes be assumed while developing the intuition of the method.

2.10.1. The Power Method, for finding the largest eigenvalue#

The basic tool is the Power Method, which will usually but not always succeed in computing the eigenvalue of largest magnitude, \(\lambda_1\), and a corresponding eigenvector \(\vec{v}_1\). Its success mainly involves assuming there being a unique largest eigenvalue: \(|\lambda_1| > |\lambda_i|\) for \(i>1\).

Note that this method is is not so much useful in itself, but as a building block for getting better methods, some of which described next.

The strategy is based on what happens when you repeatedy multiply by the matrix \(A\). Any initial vector \(\vec{x}^{(0)}\) can be expressed in terms of the eigenvectors \(\vec{v}^{(i)}\), \(1 \leq i \leq n\) with

\[ A \vec{v}^{(i)} = \lambda_i \vec{v}^{(i)} \]

That is

\[ \vec{x}^{(0)} = \sum_{i=1}^n c_i \vec{v}^{(i)} \]

and so

\[ \vec{x}^{(1)} = A \vec{x}^{(0)} = \sum_{i=1}^n c_i \lambda_i \vec{v}^{(i)} \]

and with \(\lambda_1\) the biggest, the \(\vec{v}^{(1)}\) term is magnified relative to the others.

Iterating with \(\vec{x}^{(k)} = A \vec{x}^{{k-1}}\) gives

(2.18)#\[ \vec{x}^{(k)} = A^k \vec{x}^{(0)} = \sum_{i=1}^n c_i \lambda_i^k \vec{v}^{(i)} = \lambda_i^k c_1 \left( \vec{v}^{(1)} + \sum_{i=2}^n \frac{c_i}{c_1} \left( \frac{\lambda_i}{\lambda_1}\right)^k \vec{v}^{(i)} \right) \]

With the assumption that \(|\lambda_1| > |\lambda_2| \geq |\lambda_3| \cdots\), so that \(| \lambda_i/\lambda_1 | < 1\) for \(i \geq 2\), all those factors \((\lambda_i/\lambda_1)^k \to 0\). Thus these iterates end up close to a multiple of \(\vec{v}^{(1)}\).

Warning: this can go wrong in the special case that \(c_1 = 0\), so that the iterates never involve a multiple of \(\vec{v}^{(1)}\). In practice this is highly unlikely, but careful software can detect this and fix the problem by changing the initial value for \(\vec{x}^{(0)}\).

The power method basically does this, but with the extra step of rescaling each intermediate vector to length one, to avoid the iterates becoming very large or very small.

One starts with a unit-length vector \(\vec{x}^{(0)}\), \(\|\vec{x}^{(0)}\| = 1\), constructs the successive multiples \(\vec{y}^{(k)} = A \vec{x}^{(k-1)}\), and rescales at each stage to the unit vectors \(\vec{x}^{(k)} = \vec{y}^{(k)}/\|\vec{y}^{(k)}\|\).

2.10.1.1. Approximating the Eigenvalue#

For a vector \(\vec{x}\), the Rayleigh quotient is defined as

\[\frac{\langle \vec{x} , A \vec{x}\rangle}{\langle \vec{x} , \vec{x}\rangle}\]

Remark 2.28 (dot products notation)

Here and below, I sometimes uses the notation \(\langle \vec{x} , \vec{x} \rangle\) in place of \(\vec{x} \cdot \vec{y}\) for the inner product (“dot product”), because this sometimes makes expressions clearer.

Numpy has the notation a.dot(b) for this, though one can also use sum(a * b), since that product is pointwise: a * b is the array of products [ a[0]*b[0] ... a[-1]*b[-1] ]

Thr Rayleight quotient can be used to approximate an eigenvalue from an approximate eigenvector: once a vector \(\vec{x}\) is approximately an eigenvector for an eigenvalue \(\lambda\) in that \(A \vec{x} \approx \lambda \vec{x}\), the Rayleigh quotient has

\[ \frac{\langle \vec{x} , A \vec{x} \rangle}{\langle \vec{x} , \vec{x} \rangle} \approx \frac{\langle \vec{x} , \lambda\vec{x} \rangle}{\langle \vec{x} , \vec{x}\rangle} = \frac{ \lambda \langle \vec{x} , \vec{x} \rangle}{\langle \vec{x} , \vec{x}\rangle} = \lambda \]

Here, once the iterate \(\vec{x}^{(k)}\) is approximately an eigenvector for an eigenvalue \(\lambda\), one gets the eigenvalue approximation

(2.19)#\[ l^{(k+1)} := \langle \vec{x}^{(k)} , \vec{y}^{(k+1)} \rangle = \frac{\langle \vec{x}^{(k)} , A \vec{x}^{(k)} \rangle}{\langle \vec{x}^{(k)} , \vec{x}^{(k)} \rangle} \approx \lambda \]

Algorithm 2.23 (A basic version of the power method)

Choose initial vector \(\vec{y}_0\), maybe with a random number generator.

Normalize to \(\vec{x}^{(0)} = \vec{y}^{(0)}/\|\vec{y}^{(0)}\|\).

for \(k\) from 1 to \(k_{max}\)
\(\quad\) \(\vec{y}^{(k)} = A \vec{x}^{(k-1)}\)
\(\quad\) \(l^{(k)} = \langle \vec{x}^{(k-1)}, \vec{y}^{(k)} \rangle\)
\(\quad\) \(\vec{x}^{(k)} = \vec{y}^{(k)}/\|\vec{y}^{(k)}\|\)
end

The final values of \(l^{(k)}\) and \(\vec{x}^{(k)}\) approximate \(\lambda_1\) and \(\vec{v}_1\) respectively.

2.10.1.2. Error Estimates and a Better Stopping Condition#

As usual, it is desirable (and often essential) to have an error estimate for these approximate results, and such an estimate can then also be used to give a better choice of how many iterations to take.

Since we are solving \(A \vec{v} - \lambda \vec{v} = 0\), a plausible measure is the residual or relative backward error

\[ \frac{ \|A \vec{x}^{(k)} - l^{(k)} \vec{x}^{(k)}\|}{\|\vec{x}^{(k)} \|}, = \|A \vec{x}^{(k)} - l^{(k)} \vec{x}^{(k)}\| \]

Here is Python code implementing this as a stopping condition for iteratation, and also returning the iteration count as a “cost” estimate; as usual, this is done in the recommended “for-break” form to avoid infinite loops:

def power_method(A, error_tolerance=1e-15, max_iterations=100, demo_mode=False):
    """Note: the default error tolerance is trying for close to the limits of machine arithmetic.
    """
    n = A.shape[0]
    # Python note: for an n by m array, A.shape gives both its dimensions: (n,m);
    # we only need the first of these.
    y = random.rand(n) -1/2  # uniformly distributed on [-1/2, 1/2)
    x = y/la.norm(y)  # Here the Euclidean norm is the right one.
    for iteration in range(max_iterations):
        y = A @ x
        l = x.dot(y)
        x = y/la.norm(y)
        error_estimate = la.norm(A @ x - l * x)
        if demo_mode:
            print(f"At iteration {iteration+1}, l_{iteration+1} = {l}, x_{iteration+1} = {x}, {error_estimate=}")
        if error_estimate < error_tolerance:
            break
    iterations = iteration + 1  # Adjust for Pyhotn counting from zero.
    return l, x, error_estimate, iterations

Example 2.7

It is often desirable to start with a test case with known solution, and one way to do that here is to use a triangular matrix, whose eigenvalues are the main diagonal entries. So we start with

\[\begin{split} A = \left[ \begin{array}{ccc} 1 & 1 & 1 \\ 0 & 10 & 1 \\ 0 & 0 & 6 \end{array} \right] \end{split}\]

The eigenvectors are also easy to compute here; in particular an eigenvector for \(\lambda_1 = 10\) is \(\left[ 1,\ 9,\ 0 \right]\), which normalizes to \(\left[ \frac{1}{\sqrt{82}} , \frac{9}{\sqrt{82}} , 0 \right] \approx \left[ 0.11043153 ,\ 0.99388373 ,\ 0 \right ]\).

v = np.array([1, 9, 0]); print(f"lambda_1 = 10, \t v_1 = {v/la.norm(v)}")
v = np.array([3, -5, 20]); print(f"lambda_3 = 1, \t v_3 = {v/la.norm(v)}")
lambda_1 = 10, 	 v_1 = [0.11043153 0.99388373 0.        ]
lambda_3 = 1, 	 v_3 = [ 0.14400461 -0.24000768  0.96003072]
A = np.array([[1, 1, 1],[0, 10, 1],[0, 0, 6]])
print(A)
[[ 1  1  1]
 [ 0 10  1]
 [ 0  0  6]]
(lambda_1, v_1, error_estimate, iterations) = power_method(A, max_iterations = 10, demo_mode=True)
At iteration 1, l_1 = 6.499427306809223, x_1 = [-0.09978228 -0.88728907 -0.45029057], error_estimate=3.6496818943059535
At iteration 2, l_2 = 9.632349722417983, x_2 = [-0.14648124 -0.95012336 -0.27533408], error_estimate=1.1797991391435285
At iteration 3, l_3 = 9.944761663316736, x_3 = [-0.13706193 -0.97671659 -0.16504158], error_estimate=0.6920392858855717
At iteration 4, l_4 = 10.039661776809117, x_4 = [-0.12708093 -0.98699897 -0.09840463], error_estimate=0.4068783341395103
At iteration 5, l_5 = 10.050979422828748, x_5 = [-0.12053482 -0.99097236 -0.05869527], error_estimate=0.24147033327440823
At iteration 6, l_6 = 10.04014856218107, x_6 = [-0.11651868 -0.99256927 -0.03506622], error_estimate=0.14406626119084132
At iteration 7, l_7 = 10.027436397980491, x_7 = [-0.11409068 -0.99324879 -0.02098001], error_estimate=0.08617358840872884
At iteration 8, l_8 = 10.017641636426216, x_8 = [-0.11262913 -0.99355764 -0.01256538], error_estimate=0.05161325329517301
At iteration 9, l_9 = 10.011003746874454, x_9 = [-0.11175079 -0.99370773 -0.00753084], error_estimate=0.030936120330893303
At iteration 10, l_10 = 10.006751657329144, x_10 = [-0.11122332 -0.99378518 -0.00451544], error_estimate=0.01855038134706941
print()
print(f"After {iterations} iterations, the approximate results are:")
print(f"{lambda_1=}")
print(f"v_1={v_1}")
print(f"with estimated error {error_estimate}")
After 10 iterations, the approximate results are:
lambda_1=10.006751657329144
v_1=[-0.11122332 -0.99378518 -0.00451544]
with estimated error 0.01855038134706941

These are all correct to two or three significant figures — better than the error estimate suggests — which is a promising start. Now continue with more iterations (but without so much output):

lambda_1, v_1, error_estimate, iterations = power_method(A)
print(f"After {iterations} iterations, the approximate results are:")
print(f"{lambda_1=}")
print(f"v_1={v_1}")
print(f"with estimated error {error_estimate}")
After 73 iterations, the approximate results are:
lambda_1=10.0
v_1=[1.10431526e-01 9.93883735e-01 1.40543395e-16]
with estimated error 5.621735817263609e-16

Note that both the eigenvalue and eigenvector are correct, within rounding error.

To move beyond this easy case to one where the results are not known exactly in advance, it is wise to check in advance that the power method is likely to converge, by ensuring that there is a unique eigenvalue of greatest magnitude.

For this, the following results help:

Theorem 2.17 (Gershgorin Circle Theorem)

A) For an \(n \times n\) matrix, each eigenvalue \(\lambda\) lies in one of the following \(n\) disks in the complex plane, each centered at one of the matrix’s diagonal elements:

(2.20)#\[ | \lambda - a_{ii} | \leq \sum_{j \neq i} |a_{i,j}| \]

B) If these disks are disjoint, each contains an eigenvalue.

C) More generally, if some of the disks overlap, forming connected “clusters”, then each cluster contains as many eigenvalues ar there are disks forming that cluster.

Remark 2.29

  1. A simple check is when the matrix is diagonal, so each disk is a point: then this theorem confirms that the eigenvalues of a diagonal matrix are simply its diagonal elements.

  2. That sum at right is the same as in the definition of strict diagonal dominance: \(| a_{ii} | > \sum_{j \neq i} |a_{i,j}|\), so that zero is not in any of the disks, and so is not an eigenvalue; this is another way to see that an SDD matrix is nonsingular.

Proof. (Of part (A) only.)

Consider any eigenvector \(\vec{v}\), and find the largest element, in position \(i\). Rescale it so that \(v_i = 1\), ensuring that \(|v_j| \leq 1\) for all \(j\).

In the equation \(\lambda \vec{v} = A \vec{v}\), look at element \(i\) and separate out the \(a_{i,i}\) term:

\[ \lambda v_i = \sum_{j=1}^n a_{i,j} v_j = a_{i,i} v_i + \sum_{j \neq i} a_{i,j} v_j \]

so

\[ (\lambda - a_{i,i}) v_i = \sum_{j \neq i} a_{i,j} v_j \]

Taking absolute values and using \(v_i = 1\) and \(|v_j| \leq 1\),

\[ |\lambda - a_{i,i}| = \left| \sum_{j \neq i} a_{i,j} v_j \right| \leq \sum_{j \neq i} | a_{i,j} | \]

as claimed in Equation (2.20).

Theorem 2.18

For a real matrix:

(A) If there is a unique eigenvalue of greatest magnitude, that eigenvalue and its eigenvector are real-valued.

(B) More generally, if an eigenvalue is the only one of a given magnitude, that eigenvalue and its eigenvector are real.

As a cautionary counterexample, consider \(A = \left[ \begin{array}{cc} 0 & 1 \\ 1 & 0 \end{array} \right]\), whose eigenvalues are \( \pm i\).

Proof. This follows from the fact that the eigenvalues are the roots of the characteristic polynomial, which for a real matrix has all real coefficients. For any non-real root of such a polynomial, its complecx conjugate is also a root, so for any non-real eigenvalue, it complex conjugate is another eigenvalue, of the same magnitude.

Example 2.8

Consider

\[\begin{split} B = \left[ \begin{array}{ccc} 1 & 1 & 1 \\ 1 & 10 & 1 \\ 0 & 1 & 6 \end{array} \right] \end{split}\]

The Gershgorin theorem gives:

  • a root within distance 2 of 1, so magnitude at most 3

  • a root within distance 1 of 6, so magnitude between 5 and 7, and

  • a root within distance 2 of 10, so magnitude at least 8,

so all have distinct magnitudes.

This more or less ensures that the Power Method converges for matrix \(B\), and that all its eigenvalues are real. (As noted above, there is a slight possibility of failure: when the initial approximation \(x^{(0)}\) is in the span of the other eigenvalues.)

Here are the same two test runs as above:

B = np.array([[1, 1, 1],[1, 10, 1],[0, 1, 6]])
print(B)
[[ 1  1  1]
 [ 1 10  1]
 [ 0  1  6]]
lambda_1, v_1, error_estimate, iterations = power_method(B, max_iterations=10, demo_mode=True)
At iteration 1, l_1 = 2.7322248426079736, x_1 = [-0.00188633 -0.21941236 -0.97563039], error_estimate=4.432576552721352
At iteration 2, l_2 = 6.623348195084995, x_2 = [-0.17208978 -0.45600603 -0.87318017], error_estimate=2.6116857285856105
At iteration 3, l_3 = 7.787255723579021, x_3 = [-0.18464388 -0.68940686 -0.70044616], error_estimate=2.4790910148582546
At iteration 4, l_4 = 9.080367820465137, x_4 = [-0.16887481 -0.83436427 -0.52470712], error_estimate=1.6569971751071608
At iteration 5, l_5 = 9.888071203443983, x_5 = [-0.15289617 -0.9043231  -0.3985254 ], error_estimate=0.9194211703086882
At iteration 6, l_6 = 10.212574412021105, x_6 = [-0.14204131 -0.93617858 -0.32154927], error_estimate=0.49761784072131915
At iteration 7, l_7 = 10.318523085778624, x_7 = [-0.13550561 -0.9511523  -0.2773942 ], error_estimate=0.27231235231077455
At iteration 8, l_8 = 10.350003572712252, x_8 = [-0.13174745 -0.95855381 -0.25262066], error_estimate=0.1500421834552861
At iteration 9, l_9 = 10.358671856851288, x_9 = [-0.12962874 -0.96236894 -0.23883555], error_estimate=0.08287898747702936
At iteration 10, l_10 = 10.360754954349174, x_10 = [-0.12844534 -0.96439248 -0.23119026], error_estimate=0.045816928029172824
print()
print(f"After {iterations} iterations, the approximate results are:")
print(f"{lambda_1=}")
print(f"v_1={v_1}")
print(f"with estimated error {error_estimate}")
After 10 iterations, the approximate results are:
lambda_1=10.360754954349174
v_1=[-0.12844534 -0.96439248 -0.23119026]
with estimated error 0.045816928029172824

Again, push towards the limits of IEEE-64 computer arithmetic:

lambda_1, v_1, error_estimate, iterations = power_method(B)
print(f"After {iterations} iterations, the approximate results are:")
print(f"{lambda_1=}")
print(f"v_1={v_1}")
print(f"with estimated error {error_estimate}")
After 63 iterations, the approximate results are:
lambda_1=10.360652315228508
v_1=[-0.12697007 -0.96681035 -0.22171232]
with estimated error 4.965068306494546e-16

We can cheat a bit: as one might expect, the module numpy.linalg has a function eig for computing eigenvalues and eigenvectors, so we can use this to corroborate our calculations.

But we should check its results with backward errors too!

[eigenvalues, eigenvectors] = la.eig(B)

This function does not sort the eigenvalues in decreasing order of magnitude as we did above, so we do not use the names \(\lambda_1\) and so on:

lambda_a = eigenvalues[0]
lambda_b = eigenvalues[1]
lambda_c = eigenvalues[2]
# The eigenvectors are the columns of the output array `eigenvectors`, so:
v_a = eigenvectors[:,0]
v_b = eigenvectors[:,1]
v_c = eigenvectors[:,2]

print("The eigenvalue-eigenvector pairs are:")
print(f"{lambda_a=}, \t v_a={v_a}")
print(f"{lambda_b=}, \t v_b={v_b}")
print(f"{lambda_c=}, \t v_c={v_c}")
print("and the residual norms are:")
print(f"v_a: {la.norm(B @ v_a - lambda_a * v_a)}")
print(f"v_b: {la.norm(B @ v_b - lambda_b * v_b)}")
print(f"v_c: {la.norm(B @ v_c - lambda_c * v_c)}")
print()
print(f"So {lambda_b=} is the largest eigenvalue;")
print(f"For comparison, we computed {lambda_1=} above:")
print(f"the discrepancy is {abs(lambda_b-lambda_1)=}")
The eigenvalue-eigenvector pairs are:
lambda_a=0.9096515641370733, 	 v_a=[-0.99349901  0.11170569 -0.02194461]
lambda_b=10.36065231522851, 	 v_b=[-0.12697007 -0.96681035 -0.22171232]
lambda_c=5.729696120634419, 	 v_c=[ 0.1473099  -0.25809251  0.95482357]
and the residual norms are:
v_a: 1.3332836816415219e-15
v_b: 8.881784197001252e-16
v_c: 5.978733960281817e-16

So lambda_b=10.36065231522851 is the largest eigenvalue;
For comparison, we computed lambda_1=10.360652315228508 above:
the discrepancy is abs(lambda_b-lambda_1)=1.7763568394002505e-15

Note: requiring the eigenvector to be of length one only determines it up to a sign, so be careful comparing two approximate eigenvectors or consecutive approximations in an iteration by looking at their difference.

2.10.2. The Inverse Power Method, for Finding the Smallest Eigenvalue#

The next step is to note that if \(A\) is nonsingular, its inverse \(B = A^{-1}\) has the same eigenvectors, but with eigenvalues \(\mu_i = 1/\lambda_i\).

Thus we can apply the power method to \(B\) in order to compute its largest eigenvalue, which is \(\mu_n = 1/\lambda_n\), along with the corresponding eigenvector \(\vec{v}_n\).

The main change to the above is that

\[\vec{y}^{(k)} = A^{-1} \vec{x}_{k-1}.\]

However, as usual one can (and should) avoid actually computing the inverse. Instead, express the above as the system of equations

\[A \vec{y}^{(k)} = \vec{x}_{k-1},\]

and with the eigenvalue estimate now \(1/l^{(k)}\), where \(l^{(k)} = \langle \vec{x}^{(k-1)} , \vec{y}^{(k)} \rangle\) as above in (2.19).

Here is an important case where the LU factorization method can speed things up greatly: a single LU factorization is needed, after which for each \(k\) one only has to do the far quicker forward and backward substitution steps: \(O(n^2)\) cost for each iteration instead of \(O(n^3/3)\).

Algorithm 2.24 (A basic version of the inverse power method)

Choose initial vector \(\vec{y}_0\), maybe with a random number generator.

Normalize to \(\vec{x}^{(0)} = \vec{y}^{(0)}/\|\vec{y}^{(0)}\|\).

Compute an LU factorization \(L U = A\).

for \(k\) from 0 to \(k_{max}\)
\(\quad\) Solve \(L \vec{z}^{(k+1)} = \vec{x}^{(k)}\)
\(\quad\) Solve \(U \vec{y}^{(k+1)} = \vec{z}^{(k+1)}\)
\(\quad\) \(l^{(k)} = \langle \vec{y}^{(k+1)}, \vec{x}^{(k)} \rangle\)
\(\quad\) \(\vec{x}^{(k+1)} = \vec{y}^{(k+1)}/\| \vec{y}^{(k+1)} \|\)
end

(If all goes well) the final values of \(1/l^{(k)}\) and \(\vec{x}^{(k)}\) approximate \(\lambda_n\) and \(\vec{v}_n\) respectively.

In an actual implementation, it is as usual preferable to stop “early” based on an error estimate, and the same one as above can be used, based on \(\| A x^{(k)} - l^{(k)} x^{(k)}\|\).

See Exercise 2.23

2.10.3. Getting Other Eigenvalues With the Shifted Inverse Power Method#

The inverse power method computes the eigenvalue closest to 0; by shifting, we can compute the eigenvalue closest to any chosen value \(s\). Then by searching various values of \(s\), we can hope to find all the eigenvectors. As a variant, once we have \(\lambda_1\) and \(\lambda_n\), we can search nearby for other large or small eigenvalues: often the few largest and/or the few smallest are most important.

With a symmetric (or Hermitian) matrix, once the eigenvalue of largest magnitude, \(\lambda_1\) is known, the rest are known to be real values in the interval \([-|\lambda_1|,|\lambda_1|]\), so we know roughly where to seek them.

The main idea here is that for any number \(s\), matrix \(A - s I\) has eigenvalues \(\lambda_i - s\), with the same eigenvectors as \(A\):

\[(A - sI)\vec{v}_i = (\lambda_i - s)\vec{v}_i\]

Thus, applying the inverse power method to \(A - s I\) computes its largest eigenvalue \(\mu\), and then \(\lambda = 1/(\mu + s)\) is the eigenvalue of \(A\) closest to \(s\).

See Exercise 2.24

2.10.4. A Refinement: Rayleigh Iteration#

One challenge with the shifted inverse power method is how to choose the shift (or shifts) used in order to get fast convergence. For convergence to a given eigenvalue, it is enough for the shift to be closer to that eigenvalue than to any other, but if the closeness is borderline, convergence can still be very slow: it can be like the situation with the power methods where \(|\lambda_1| > |\lambda_2|\) but the ratio \(|\lambda_2/ \lambda_1|\) appearing in Equation (2.18) is close to one.

A strategy to overcome this is to use each newly computed approximation of the eigenvalue as an updated shift: the original shift can be replaced by the Rayleigh quotient approximation \(l^{(k)} = \vec{x}^{(k)} \cdot \vec{y}^{(k+1)}\), so the update step becomes

\[ (A - l^{(k)}I) \vec{y}^{(k+1)} = \vec{x}^{(k)} \]

A disadvantage of this is that now, the matrix at left changes at each iteration, preventing one from reusing the LU factorization so that the cost per iteration (for a general matrix) is \(O(n^3/3)\) instead of \(O(n^2)\).

There are various strategies for speeding this up. One basic idea is to stop updating the shift once the convergence is seen to be fast, as indicated by the error estimate decreasing rapidly between successive iterates.

Another involves some initial transformation of the matrix, to an “easier” one with the same eigenvalues; once they are computed, each eigenvector of the original matrix can be easily computed by solving \((A - \lambda_i I) \vec{v}^{(i)} = \vec{0}\) with a variation of row reduction.

Specifically, one can transform a real symmetric (or Hermitian) matrix into a tridiagonal matrix, allowing the use of the faster algorithm seen in Faster Methods for Solving Ax=b with Tridiagonal and Banded Matrices. More generally, any square matrix can be transformed to one with the same eigenvalues and in so-called “upper Hessenberg” form, which is almost upper triangular: the non-zero elements only go down to the diagonal immediately below the main diagonal. For such matrices, the cost of row reduction to only \(O(n^2)\).

However, we will not go into this further, because in practice there are different, more elaborate approaches that work better in most cases, particularly if one needs more than just a few of the eigenvalues. This is discussed briefly in the next sub-section.

2.10.5. Further Topics: Getting all the Eigenvalues With the QR Method, etc.#

The above methods are not ideal when many or all of the eigenvalues of a matrix are wanted; then a variety of more advanced methods have been developed, starting with the QR (factorization) Method.

We will not address the details of that method in this course, but one way to think about it for a symmetric matrix is that:

  • The eigenvectors are orthogonal.

  • Thus, if after computing \(\lambda_1\) and \(\vec{v}_1\), one uses the power iteration starting with \(\vec{x}^{\,0,2}\) orthogonal to \(\vec{v}_1\), then all the new iterates \(\vec{x}^{\,k,2}\) will stay orthogonal, and one will get the eigenvector corresponding to the largest remaining eigenvector: you get \(\vec{v}_2\) and \(\lambda_2\).

  • Continuing likewise, one can get the eigenvalues in descending order of magnitude.

  • As a modification, one can do all these almost in parallel: at iteration \(k\), have an approximation \(\vec{x}^{\,k,i}\) for each \(\vec{v}_i\), got by adjusting these new approximations so that \(\vec{x}^{\,k,i}\) is orthogonal to all the approximations \(\vec{x}^{\,k,j}\), \(j < i\), for all the previous (larger) eigenvalues. This uses a variant of the Gram-Schmidt method for orthogonalizing a set of vectors.

  • One can also incorporate shifts into this: for example, using the above Rayleigh quotient approximations as the shifts.

2.10.6. Exercises#

Exercise 2.23

Implement Algorithm 2.24 for the inverse power method, adding a stopping condition based on an error estimate, with usage

(lambda_n, v_n, error_estimate, iterations) = inverse_power_method(A, error_tolerance, max_iterations, demo_mode)

mimicing the function power_method above.

Test this initially on the matrices \(A\) and \(B\) above.

Exercise 2.24

Implement the Shifted Inverse Power Method.

As above, test initially with the matrices \(A\) and \(B\) above.

Some plausible initial choices for the shifts are each of the entries on the main diagonal.