2.9. Computing Eigenvalues and Eigenvectors: the Power Method, and a bit beyond#
References:
Section 12.1 Power Iteration Methods of [Sauer, 2022].
Section 7.2 Eigenvalues and Eigenvectors of [Burden et al., 2016].
Chapter 8, More on Linear Equations of [Chenney and Kincaid, 2012], in particular section 3 Power Method, and also section 2 Eigenvalues and Eigenvectors as background reading.
The eigenproblem for a square
(By non-trivial, I mean to exclude
With eigenproblems, and particularly those arising from differential equations, one often needs only the few smallest and/or largest eigenvalues. For these, the power method described next can be adapted, leading to the shifted inverse power method.
Here we often restict our attention to the case of a real symmetric matrix (
all eigenvalues are real,
for symmetric matrices, all eigenvectors are also real,
there is a complete set of orthogonal eigenvectors
, that form a basis for all vectors, and so on.
However, the methods described here can be used more generally, or can be made to work with minor adjustments.
The eigenvalue are roots of the characteristic polynomial,
Generically, all the magnitudes are different, which makes things works more easily, so that will sometimes be assumed while developing the intuition of the method.
2.9.1. The Power Method#
The basic tool is the Power Method, which will usually but not always succeed in computing the eigenvalue of largest magnitude,
In its simplest form, one starts with a unit-length vector
Note that
Remark 2.21 (dot products in Python)
Here and below, I use a.dot(b)
.
Algorithm 2.17 (A basic version of the power method)
Choose initial vector
Normalize to
for
end
The final values of
Exercise 1#
Implement this algorithm and test it on the real, symmetric matrix
This all real eigenvalues, all within
As a debugging strategy, you could replace all those off-diagonal ones by a small value
Then the Gershgorin circle theorem ensures that each eigenvalue is within
You could even start with
Here and below you could check your work with Numpy, using function numpy.linalg.eig(A)
.
However, that is almost cheating, so note that there is also a backward error check:
see how small
import numpy as np
import numpy.linalg as la
help(la.eig)
delta = 0.01
A = np.array([[3, delta, delta],[delta, 8, delta],[delta, delta, 4]])
[eigenvalues, eigenvectors] = la.eig(A)
print(f"With {delta=}, so that A is")
print(A)
lambda_0 = eigenvalues[0]
lambda_1 = eigenvalues[1]
lambda_2 = eigenvalues[2]
# The eigenvectors are the columns of the output array `eigenvectors`, so:
v_0 = list(eigenvectors[:,0]) # Lists print more nicely than arrays!
v_1 = list(eigenvectors[:,1])
v_2 = list(eigenvectors[:,2])
print(f"The eigenvalues are {eigenvalues}")
print("and the eigenvalue-eigenvector pairs are")
print(f"{lambda_0=}, \t {v_0=}")
print(f"{lambda_1=}, \t {v_1=}")
print(f"{lambda_2=}, \t {v_2=}")
With delta=0.01, so that A is
[[3. 0.01 0.01]
[0.01 8. 0.01]
[0.01 0.01 4. ]]
The eigenvalues are [2.99988041 8.0000451 4.00007449]
and the eigenvalue-eigenvector pairs are
lambda_0=2.9998804099870666, v_0=[-0.9999482535404538, 0.0019798921714429545, 0.009978490285907787]
lambda_1=8.00004509976119, v_1=[0.002004981562986897, 0.999994852570506, 0.0025049713419308117]
lambda_2=4.000074490251736, v_2=[0.009973479349182978, -0.0025248484075824345, 0.9999470760246215]
Refinement: deciding the iteration count#
Some details are omitted above; above all, how to decide the number of iterations.
One approach is to use the fact that an eigenvector-eigenvalue pair satisfies
is a measure of “relative backward error”.
Thus one could repace the above for
loop by a while
loop based on a condition like stopping when
Alternatively, keep the for
loop, but exit early (with break
) if this condition is met.
I generally recommend this for-if-break
form for implementing iterative methods, because it makes avoidance of infinite loops simpler, and avoids the common while
loop issue that you do not yet have an error estimate when the loop starts.
Exercise 2#
Modify your code from Exercise 1 to implement this accuracy control.
2.9.2. The Inverse Power Method#
The next step is to note that if
Thus we can apply the power method to
The main change to the above is that
However, as usual one can (and should) avoid actually computing the inverse. Instead, express the above as the sysem of equations.
Here is an important case where the LU factorization method can speed things up greatly: a single LU factorization is needed, after which for each
Algorithm 2.18 (A basic version of the inverse power method)
Choose initial vector
Normalize to
Compute an LU factorization
for
end
(If all goes well) the final values of
Exercise 3#
Implement this basic algorithm (with a fixed iteration count, as in Example 1), and then create a second version that imposes an accuracy target (as in Example 2).
2.9.3. Getting other eigenvalues with the Shifted Inverse Power Method#
The inverse power method computes the eigenvalue closest to 0; by shifting, we can compute the eigenvalue closest to any chosen value
With a symmetric (or Hermitian) matrix, once the eigenvalue of largest magnitude,
The main idea here is that for any number
Thus, applying the inverse power method to
Exercise 4#
As above, implement this, probably sarting with a fixed iteration count version.
For the test case above, some plausible initial choices for the shifts are each if the entries on the main diagonal,
and as above, testing with
2.9.4. Further topics: getting all the eigenvalues with the QR Method, etc.#
The above methods are not ideal when many or all of the eigenvalues of a matrix are wanted; then a variety of more advanced methods have been developed, starting with the QR (factorization) Method.
We will not address the details of that method in this course, but one way to think about it for a symmetric matrix is that:
The eigenvectors are orthogonal.
Thus, if after computing
and , one uses the power iteration starting with orthogonal to , then all the new iterates will stay orthogonal, and one will get the eigenvector corresponding to the largest remaining eigenvector: you get and .Continuing likewise, one can get the eigenvalues in descending order of magnitude.
As a modification, one can do all these almost in parallel: at iteration
, have an approximation for each and at each stage, got by adjusting these new approximations so that is orthogonal to all the approximations , , for all the previous (larger) eigenvalues. This uses a variant of the Gram-Schmidt method for orthogonalizing a set of vectors.