2.9. Computing Eigenvalues and Eigenvectors: the Power Method, and a bit beyond#
References:
Section 12.1 Power Iteration Methods of [Sauer, 2022].
Section 7.2 Eigenvalues and Eigenvectors of [Burden et al., 2016].
Chapter 8, More on Linear Equations of [Chenney and Kincaid, 2012], in particular section 3 Power Method, and also section 2 Eigenvalues and Eigenvectors as background reading.
The eigenproblem for a square \(n \times n\) matrix \(A\) is to compute some or all non-trivial solutions of
(By non-trivial, I mean to exclude \(\vec{v} = 0\), which gives a solution for any \(\lambda\).) That is, to compute the eigenvalues \(\lambda\) (of which generically there are \(n\), but sometimes less) and the eigenvectors \(\vec{v}\) corresponding to each.
With eigenproblems, and particularly those arising from differential equations, one often needs only the few smallest and/or largest eigenvalues. For these, the power method described next can be adapted, leading to the shifted inverse power method.
Here we often restict our attention to the case of a real symmetric matrix (\(A^T = A\), or \(A_{ij} = A_{ji}\)), or a Hermitian matrix (\(A^T = A^*\)), for which many things are a bit simpler:
all eigenvalues are real,
for symmetric matrices, all eigenvectors are also real,
there is a complete set of orthogonal eigenvectors \(\vec{v}_k\), \(1 \leq i \leq n\) that form a basis for all vectors, and so on.
However, the methods described here can be used more generally, or can be made to work with minor adjustments.
The eigenvalue are roots of the characteristic polynomial, \(\det(A - \lambda I)\); repeated roots are possible, and they will all be named, so there are always values \(\lambda_i\), \(1 \leq i \leq n\). Here, these eigenvalues will be enumerated in decreasing order of magnitude:
Generically, all the magnitudes are different, which makes things works more easily, so that will sometimes be assumed while developing the intuition of the method.
2.9.1. The Power Method#
The basic tool is the Power Method, which will usually but not always succeed in computing the eigenvalue of largest magnitude, \(\lambda_1\), and a corresponding eigenvector \(\vec{v}_1\). Its success mainly involves assuming there being a unique largest eigenvalue: \(\lambda_1 > \lambda_i\) for \(i>1\).
In its simplest form, one starts with a unit-length vector \(\vec{x}^{\,0}\), so \(\|\vec{x}^{\,0}\| = 1\), constructs the successive multiples \(\vec{y}^{\,k} = A^k \vec{x}^{\,0}\) by successive multiplications, and rescales at each stage to the unit vectors \(\vec{x}^{\,k} = \vec{y}^{\,k}/\|\vec{y}^{\,k}\|\).
Note that \(\vec{y}^{\,k+1} = A \vec{x}^{\,k}\), so that once \(\vec{x}^{\,k}\) is approximately an eigenvector for eigenvalue \(\lambda\), \(\vec{y}^{\,k+1} \approx \lambda \vec{x}^{\,k}\), leading to the eigenvalue approximation
(dot products in Python)
Here and below, I use \(\langle \vec{a}, \vec{b}\rangle\) to denote the inner product (a.k.a. dot product) of two vectors; with Numpy arrays this is given by a.dot(b)
.
(A basic version of the power method)
Choose initial vector \(\vec{y}_0\), maybe with a random number generator.
Normalize to \(\vec{x}^{\,0} = \vec{y}^{\,0}/\|\vec{y}^{\,0}\|\).
for \(k\) from 0 to \(k_{max}\)
\(\quad\) \(\vec{y}^{\,k+1} = A \vec{x}^{\,k}\)
\(\quad\) \(r^{(k)} = \langle \vec{y}^{\,k+1}, \vec{x}^{\,k} \rangle\)
\(\quad\) \(\vec{x}^{\,k+1} = \vec{y}^{\,k+1}/\|\vec{y}^{\,k+1}\|\)
end
The final values of \(r^{(k)}\) and \(\vec{x}^{\,k}\) approximate \(\lambda_1\) and \(\vec{v}_1\) respectively.
Exercise 1#
Implement this algorithm and test it on the real, symmetric matrix
This all real eigenvalues, all within \(2\) of the diagonal elements (this claim should be explained as part of the project write-up), so start with it.
As a debugging strategy, you could replace all those off-diagonal ones by a small value \(\delta\):
Then the Gershgorin circle theorem ensures that each eigenvalue is within \(2\delta\) of an entry on the main diagonal. Furthermore, if \(\delta\) is small enough that the circles of radius \(2\delta\) centered on the diagonal elements do not overlap, then there is one eigenvalue in each circle.
You could even start with \(\delta=0\), for which yuo know exactly the eigenvalues: they are the diagonal elements.
Here and below you could check your work with Numpy, using function numpy.linalg.eig(A)
.
However, that is almost cheating, so note that there is also a backward error check: see how small \(\|A v - \lambda v\|/\| v \|\) is.
import numpy as np
import numpy.linalg as la
help(la.eig)
delta = 0.01
A = np.array([[3, delta, delta],[delta, 8, delta],[delta, delta, 4]])
[eigenvalues, eigenvectors] = la.eig(A)
print(f"With {delta=}, so that A is")
print(A)
lambda_0 = eigenvalues[0]
lambda_1 = eigenvalues[1]
lambda_2 = eigenvalues[2]
# The eigenvectors are the columns of the output array `eigenvectors`, so:
v_0 = list(eigenvectors[:,0]) # Lists print more nicely than arrays!
v_1 = list(eigenvectors[:,1])
v_2 = list(eigenvectors[:,2])
print(f"The eigenvalues are {eigenvalues}")
print("and the eigenvalue-eigenvector pairs are")
print(f"{lambda_0=}, \t {v_0=}")
print(f"{lambda_1=}, \t {v_1=}")
print(f"{lambda_2=}, \t {v_2=}")
With delta=0.01, so that A is
[[3. 0.01 0.01]
[0.01 8. 0.01]
[0.01 0.01 4. ]]
The eigenvalues are [2.99988041 8.0000451 4.00007449]
and the eigenvalue-eigenvector pairs are
lambda_0=2.9998804099870666, v_0=[-0.9999482535404538, 0.0019798921714429545, 0.009978490285907787]
lambda_1=8.00004509976119, v_1=[0.002004981562986897, 0.999994852570506, 0.0025049713419308117]
lambda_2=4.000074490251736, v_2=[0.009973479349182978, -0.0025248484075824345, 0.9999470760246215]
Refinement: deciding the iteration count#
Some details are omitted above; above all, how to decide the number of iterations.
One approach is to use the fact that an eigenvector-eigenvalue pair satisfies \(A \vec{v} - \lambda \vec{v} = 0\), so the “residual norm”
is a measure of “relative backward error”.
Thus one could repace the above for
loop by a while
loop based on a condition like stopping when
Alternatively, keep the for
loop, but exit early (with break
) if this condition is met.
I generally recommend this for-if-break
form for implementing iterative methods, because it makes avoidance of infinite loops simpler, and avoids the common while
loop issue that you do not yet have an error estimate when the loop starts.
Exercise 2#
Modify your code from Exercise 1 to implement this accuracy control.
2.9.2. The Inverse Power Method#
The next step is to note that if \(A\) is nonsingular, its inverse \(B = A^{-1}\) has the same eigenvectors, but with eigenvalues \(\mu_i = 1/\lambda_i\).
Thus we can apply the power method to \(B\) in order to compute its largest eigenvalue, which is \(\mu_n = 1/\lambda_n\), along with the corresponding eigenvector \(\vec{v}_n\).
The main change to the above is that
However, as usual one can (and should) avoid actually computing the inverse. Instead, express the above as the sysem of equations.
Here is an important case where the LU factorization method can speed things up greatly: a single LU factorization is needed, after which for each \(k\) one only has to do the far quicker forward and backward substitution steps: \(O(n^2)\) cost for each iteration instead of \(O(n^3/3)\).
(A basic version of the inverse power method)
Choose initial vector \(\vec{y}_0\), maybe with a random number generator.
Normalize to \(\vec{x}^{\,0} = \vec{y}^{\,0}/\|\vec{y}^{\,0}\|\).
Compute an LU factorization \(L U = A\).
for \(k\) from 0 to \(k_{max}\)
\(\quad\) Solve \(L \vec{z}^{\,k+1} = \vec{x}^{\,k}\)
\(\quad\) Solve \(U \vec{y}^{\,k+1} = \vec{z}^{\,k+1}\)
\(\quad\) \(r^{(k)} = \langle \vec{y}^{\,k+1}, \vec{x}^{\,k} \rangle\)
\(\quad\) \(\vec{x}^{\,k+1} = \vec{y}^{\,k+1}/\| \vec{y}^{\,k+1} \|\)
end
(If all goes well) the final values of \(r^{(k)}\) and \(\vec{x}^{\,k}\) approximate \(\lambda_n\) and \(\vec{v}_n\) respectively.
Exercise 3#
Implement this basic algorithm (with a fixed iteration count, as in Example 1), and then create a second version that imposes an accuracy target (as in Example 2).
2.9.3. Getting other eigenvalues with the Shifted Inverse Power Method#
The inverse power method computes the eigenvalue closest to 0; by shifting, we can compute the eigenvalue closest to any chosen value \(s\). Then by searching various values of \(s\), we can hope to find all the eigenvectors. As a variant, once we have \(\lambda_1\) and \(\lambda_n\), we can search nearby for other large or small eigenvalues: often the few largest and/or the few smallest are most important.
With a symmetric (or Hermitian) matrix, once the eigenvalue of largest magnitude, \(\lambda_1\) is known, the rest are known to be real values in the interval \([-|\lambda_1|,|\lambda_1|]\), so we know roughly where to seek them.
The main idea here is that for any number \(s\), matrix \(A - s I\) has eigenvalues \(\lambda_i - s\), with the same eigenvectors as \(A\):
Thus, applying the inverse power method to \(A - s I\) computes its largest eigenvalue \(\gamma\), and then \(\lambda = 1/(\gamma + s)\) is the eigenvalue of \(A\) closest to \(s\).
Exercise 4#
As above, implement this, probably sarting with a fixed iteration count version.
For the test case above, some plausible initial choices for the shifts are each if the entries on the main diagonal, and as above, testing with \(A_s\)
2.9.4. Further topics: getting all the eigenvalues with the QR Method, etc.#
The above methods are not ideal when many or all of the eigenvalues of a matrix are wanted; then a variety of more advanced methods have been developed, starting with the QR (factorization) Method.
We will not address the details of that method in this course, but one way to think about it for a symmetric matrix is that:
The eigenvectors are orthogonal.
Thus, if after computing \(\lambda_1\) and \(\vec{v}_1\), one uses the power iteration starting with \(\vec{x}^{\,0,2}\) orthogonal to \(\vec{v}_1\), then all the new iterates \(\vec{x}^{\,k,2}\) will stay orthogonal, and one will get the eigenvector corresponding to the largest remaining eigenvector: you get \(\vec{v}_2\) and \(\lambda_2\).
Continuing likewise, one can get the eigenvalues in descending order of magnitude.
As a modification, one can do all these almost in parallel: at iteration \(k\), have an approximation \(\vec{x}^{\,k,i}\) for each \(\lambda_i\) and at each stage, got by adjusting these new approximations so that \(\vec{x}^{\,k,i}\) is orthogonal to all the approximations \(\vec{x}^{\,k,j}\), \(j < i\), for all the previous (larger) eigenvalues. This uses a variant of the Gram-Schmidt method for orthogonalizing a set of vectors.