Solving Nonlinear Systems of Equations by generalizations of Newton’s Method — a Brief Introduction
Contents
17. Solving Nonlinear Systems of Equations by generalizations of Newton’s Method — a Brief Introduction#
References:
Section 2.7 Nonlinear Systems of Equations of Sauer; in particular Sub-section 2.7.1 Multivariate Newton’s Method.
Chapter 10 Numerical Solution of Nonlinear Systems of Equations of Burden&Faires; in particular Sections 10.1 and 10.2.
17.1. Background#
A system of simultaneous nonlinear equations
can be solved by a variant of Newton’s method.
Aside on notation: For the sake of emphasise analogies between results for vector-vslued quantities and what we have already seen for real- or compex-valued quantities, severl notiot chices are made in thes notes:
Using the same notation that vectors as for real or complex numbers (no over arrows or bold face).
Often denoting the derivative of function \(f\) as \(Df\) rather than \(f'\), and higher derivatives as \(D^p f\) rather than \(f^{(p)}\). Partial derivatives of \(f(x, y, \dots)\) are \(D_x f\), \(D_y f\), etc., and for vector arguments \(x = (x_1, \dots , x_j, \dots x_n)\), they are \(D_{x_1} f, \dots , D_{x_j} f , \dots , D_{x_n} f\), or more concisely, \(D_1 f \dots , D_j f , \dots ,D_n f\). (This also fits better with Python code notation — even for partial derivatives.)
Subscripts will mostly be reserved for components of vectors, labelling the terms of a sequence with superscripts, \(x^{(k)}\) and such.
Explicit division is avoided.
However, I use capital letters for vector-valued functions, for analogy to the use of capital letter for matrices.
Rewriting Newton’s method according to this new style, $\(x^{(k+1)} = x^{(k)} - \frac{f(x^{(k)})}{Df(x^{(k)})}\)$
or to avoid explicit division and introducing the useful increment \(\delta^{(k)} := x^{(k+1)} - x^{(k)}\),
17.2. Newton’s method iteration formula for systems#
For vector valued functions, we will see in a while that an analogous result is true:
where \(DF(x)\) is the \(n \times n\) matrix of all the partial derivatives \((D_{x_j} F_i)(x)\) or \((D_j F_i)(x)\), where \(x = (x_1, x_2, \dots, x_n).\)
17.2.1. Justification: linearization for function of several variables#
To justify the above result, we need at least a case of Taylor’s Theorem for functions of several variables, for both \(f:\mathbb{R}^n \to \mathbb{R}\) and \(F:\mathbb{R}^n \to \mathbb{R}^n\); just for linear approximations. This material from multi-variable calculus will be reviewed when we need it.
Warning: although mathematically this can be written with matrix inverses as
evaluation of the inverse is in general about three times slower than solving the linear system, so is best avoided. (We will soon see a “compromise”; the LU factorization of a matrix.)
Even avoiding matrix inversion, this involves repeatedly solving systems of \(n\) simultaneous linear equations in \(n\) unknowns, \(A x = b\), where the matrix \(A\) is \(D\textbf{F}(x^{(k)})\), and that will be seen to involve about \(n^3/3\) arithmetic operations.
It also requires computing the new values of these \(n^2\) partial derivatives at each iteration, also potentially with a cost proportional to \(n^3\).
When \(n\) is large, as is common with differential equations problems, this factor of \(n^3\) indicates a potentially very large cost per iteration, so various modifications have been developed to reduce the computational cost of each iteration (with the trade-off being that more iterations are typically needed): so-called quasi-Newton methods.
This work is licensed under Creative Commons Attribution-ShareAlike 4.0 International