2.11. Solving Nonlinear Systems of Equations by generalizations of Newton’s Method — an introduction#

Last revised on November 18, 2025, adding brief notes on approaches beyond the basic Newton’s Method

References:

  • [Chasnov, 2012] Section 3.5, System of nonlinear equations.

  • [Sauer, 2022] Section 2.7, Nonlinear Systems of Equations — in particular, sub-section 2.7.1, Multivariate Newton’s Method.

  • [Burden et al., 2016] Chapter 10, Numerical Solution of Nonlinear Systems of Equations — in particular Sections 10.1 and 10.2.

  • [Dionne, 2023] Chapter 5, Iterative Methods to Solve Systems of Nonlinear Equations — in particular, Section 5.2, Newton’s Method.

  • [Dahlquist and Björck, 2009] Section 11.1, Systems of Nonlinear Equations, in particular to subsection 11.1.6.

2.11.1. Background#

A system of simultaneous nonlinear equations

\[ \mathbf{F}(\mathbf{x}) = 0, \; \mathbf{F}:\mathbb{R}^n \to \mathbb{R}^n \]

can be solved by a variant of Newton’s method.

Remark 2.30 (Some notation)

For the sake of simplifying notation, such as when when subscripts or superscripts are used to denotes a succesion of vector quantities:

  • the over-arrow notation for vectors is avoided; instead, boldface is used where possible.

  • Vector-value functions r denotd wiht upercase letter, for analgy to standard matrix notation.

  • The derivative of function \(f\) is denoted as \(Df\) rather than \(f'\), and higher derivatives as \(D^p f\) rather than \(f^{(p)}\).

  • Partial derivatives of \(f(x, y, \dots)\) are denoted \(D_x f\), \(D_y f\), etc., and for vector arguments \(\mathbf{x} = (x_1, \dots , x_j, \dots x_n)\), they are \(D_{x_1} f, \dots , D_{x_j} f , \dots , D_{x_n} f\), or more concisely, \(D_1 f \dots , D_j f , \dots , f\). (This also fits better with Python code notation — even for partial derivatives.)

  • Explicit division is avoided, in favor of multipying by an inverse.

Rewriting Newton’s method according to this new style,

\[x^{(k+1)} = x^{(k)} - \frac{f(x^{(k)})}{Df(x^{(k)})}\]

and to avoid explicit division and introducing the increment \(\delta x^{(k)} := x^{(k+1)} - x^{(k)}\),

\[Df(x^{(k)}) (\delta x^{(k)}) = f(x^{(k)}), \quad x^{(k+1)} = x^{(k)} + \delta x^{(k)}.\]

2.11.2. Newton’s method iteration formula for systems#

For vector valued functions, an analogous result is true:

\[ D \textbf{F}(\mathbf{x}^{(k)}) ({\mathbf{\delta x}}^{(k)}) = \textbf{F}(\mathbf{x}^{(k)}), \quad \mathbf{x}^{(k+1)} = \mathbf{x}^{(k)} + \mathbf{\delta x}^{(k)} \]

where \(D \mathbf{F}(\mathbf{x})\) is the Jacobian of \(\mathbf{F}\): the \(n \times n\) matrix of all the partial derivatives \((D_{x_j} F_i)(\mathbf{x})\) or \((D_j F_i)(\mathbf{x})\), where \(\mathbf{x} = (x_1, x_2, \dots, x_n).\)

2.11.2.1. Justification: linearization for functions of several variables#

The above result comes from the tangent plane or linear approximation of a function of several variables, applied to each component\(F_i\) of \(\mathbf F\); see for example Section 4.4: Tangent Planes and Linear Approximations of OpenStax Calculus, Volume 3.

Warning: although mathematically this can be written with matrix inverses as

\[ \textbf{x}^{(k+1)} = \textbf{x}^{(k)} - (D \textbf{F}(\textbf{x}^{(k)})^{-1}(\textbf{F}(\textbf{x}^{(k)}), \]

we have seen that evaluation of the inverse is in general about three times slower than solving the linear system, so is best avoided.

Note: Since the matrix is different at each iteration, the LU factorization does not help here, except for its possible benefits in reducing rounding error.

Even avoiding matrix inversion, this involves repeatedly solving systems of \(n\) simultaneous linear equations in \(n\) unknowns, \(A \mathbf{x} = \mathbf{b}\), where the matrix \(A\) is \(D\textbf{F}(\mathbf{x}^{(k)})\), and that in general involves about \(n^3/3\) arithmetic operations.

It also requires computing the new values of these \(n^2\) partial derivatives at each iteration.

When \(n\) is large, as is common with differential equations problems, this factor of \(n^3\) indicates a potentially very large cost per iteration, so various modifications have been developed to reduce the computational cost of each iteration (with the trade-off being that more iterations are typically needed): so-called quasi-Newton methods.

2.11.3. Limitations and Other Strategies#

In addition to the potentially high cost when \(n\) is large, the weaknesses of the basic Newton’s method persists: one generally needs a good initial approximation in order to achieve convergence, and in some situations the partial derivatives are not available.

Improving on Newton’s method is a big subject, so her is just a sketch one strategy, which is connecting the root-finding problem to minimization of the backward error \(\| \mathbf{F}(\mathbf{x}) \|\), and so using methods akin to the method of steepest descent to be seen in Finding the Minimum of a Function of Several Variables.

One can modify Newton’s method by adjusting the step size to go in the smae direction as above, but a different amount:

\[ \delta \mathbf{x}^{(k)} = - \omega_k (D \textbf{F}(\textbf{X}^{(k)})^{-1}(\textbf{F}(\textbf{X}^{(k)}) \]

and choosing the factor \(\omega_k \) to minimize the backward error along that line. This guarantees that the backward error decreases, often avoiding divergence from the root.

There are details to be resolved to get a reliable algorithm out of this; one for example is that it is best to use the Euclidean norm, or mor precisely its square

\[ E(\mathbf{x}) = \| \mathbf{F}(\mathbf{x}) \|_2^2 = \mathbf{F}(\mathbf{x}) \cdot \mathbf{F}(\mathbf{x}) \]

as the function to minimize: this avoids computing either maxima or square roots.

This makes the strategy somewhat similar to the least squares method discussed in sections Least-squares Fitting to Data and Least-squares Fitting to Data: Appendix on The Geometrical Approach.