2.10. Solving Nonlinear Systems of Equations by generalizations of Newton’s Method — a brief introduction#
References:
Section 2.7 Nonlinear Systems of Equations of [Sauer, 2022]; in particular Sub-section 2.7.1 Multivariate Newton’s Method.
Chapter 10 Numerical Solution of Nonlinear Systems of Equations of [Burden et al., 2016]; in particular Sections 10.1 and 10.2.
2.10.1. Background#
A system of simultaneous nonlinear equations
can be solved by a variant of Newton’s method.
(Some notation)
For the sake of emphasise analogies between results for vector-valued quantities and what we have already seen for real- or compex-valued quantities, several notational choices are made in thes notes:
Using the same notation for vectors as for real or complex numbers (no over arrows or bold face).
Often denoting the derivative of function \(f\) as \(Df\) rather than \(f'\), and higher derivatives as \(D^p f\) rather than \(f^{(p)}\). Partial derivatives of \(f(x, y, \dots)\) are \(D_x f\), \(D_y f\), etc., and for vector arguments \(x = (x_1, \dots , x_j, \dots x_n)\), they are \(D_{x_1} f, \dots , D_{x_j} f , \dots , D_{x_n} f\), or more concisely, \(D_1 f \dots , D_j f , \dots ,D_n f\). (This also fits better with Julia code notation — even for partial derivatives.)
Subscripts will mostly be reserved for components of vectors, labelling the terms of a sequence with superscripts, \(x^{(k)}\) and such.
Explicit division is avoided.
However, I use capital letters for vector-valued functions, for analogy to the use of capital letter for matrices.
Rewriting Newton’s method according to this new style, $\(x^{(k+1)} = x^{(k)} - \frac{f(x^{(k)})}{Df(x^{(k)})}\)$
or to avoid explicit division and introducing the useful increment \(\delta^{(k)} := x^{(k+1)} - x^{(k)}\),
2.10.2. Newton’s method iteration formula for systems#
For vector valued functions, we will see in a while that an analogous result is true:
where \(DF(x)\) is the \(n \times n\) matrix of all the partial derivatives \((D_{x_j} F_i)(x)\) or \((D_j F_i)(x)\), where \(x = (x_1, x_2, \dots, x_n).\)
Justification: linearization for function of several variables#
To justify the above result, we need at least a case of Taylor’s Theorem for functions of several variables, for both \(f:\mathbb{R}^n \to \mathbb{R}\) and \(F:\mathbb{R}^n \to \mathbb{R}^n\); just for linear approximations. This material from multi-variable calculus will be reviewed when we need it.
Warning: although mathematically this can be written with matrix inverses as
evaluation of the inverse is in general about three times slower than solving the linear system, so is best avoided. (We have seen a good compromise; Solving Ax = b with LU factorization, A = L U the LU factorization of a matrix.)
Even avoiding matrix inversion, this involves repeatedly solving systems of \(n\) simultaneous linear equations in \(n\) unknowns, \(A x = b\), where the matrix \(A\) is \(D\textbf{F}(x^{(k)})\), and that will be seen to involve about \(n^3/3\) arithmetic operations.
It also requires computing the new values of these \(n^2\) partial derivatives at each iteration, also potentially with a cost proportional to \(n^3\).
When \(n\) is large, as is common with differential equations problems, this factor of \(n^3\) indicates a potentially very large cost per iteration, so various modifications have been developed to reduce the computational cost of each iteration (with the trade-off being that more iterations are typically needed): so-called quasi-Newton methods.