{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "(section:newtons-method-systems)=\n",
    "# Solving Nonlinear Systems of Equations by generalizations of Newton's Method — a brief introduction"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Revision of August 25, 2025**."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**References:**\n",
    "\n",
    "- {cite}`Sauer` Section 2.7, *Nonlinear Systems of Equations* —\n",
    "in particular, sub-section 2.7.1, *Multivariate Newton's Method*.\n",
    "\n",
    "- {cite}`Burden-Faires` Chapter 10, *Numerical Solution of Nonlinear Systems of Equations* —\n",
    "in particular Sections 10.1 and 10.2.\n",
    "\n",
    "- {cite}`Dionne` Chapter 5, *Iterative Methods to Solve Systems of Nonlinear Equations* — in particular, Section 5.2, *Newton’s Method*."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Background\n",
    "\n",
    "A system of simultaneous nonlinear equations\n",
    "\n",
    "$$\n",
    "F(x) = 0, \\; F:\\mathbb{R}^n \\to \\mathbb{R}^n\n",
    "$$\n",
    "\n",
    "can be solved by a variant of Newton's method."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "```{prf:remark} Some notation\n",
    ":label: remark-vector-derivative notation\n",
    "\n",
    "For the sake of emphasise analogies between results for vector-valued quantities and what we have already seen for real- or compex-valued quantities, several notational choices are made in these notes:\n",
    "\n",
    "- Using the same notation for vectors as for real or complex numbers (no over arrows or bold face).\n",
    "\n",
    "- Often denoting the derivative of function $f$ as $Df$ rather than $f'$, and higher derivatives as $D^p f$ rather than $f^{(p)}$.\n",
    "Partial derivatives of $f(x, y, \\dots)$ are $D_x f$, $D_y f$, etc., and for vector arguments $x = (x_1, \\dots , x_j, \\dots x_n)$, they are $D_{x_1} f, \\dots , D_{x_j} f , \\dots , D_{x_n} f$, or more concisely,\n",
    "$D_1 f \\dots , D_j f , \\dots , f$.\n",
    "(This also fits better with Python code notation — even for partial derivatives.)\n",
    "\n",
    "- Subscripts will mostly be reserved for components of vectors, labelling the terms of a sequence with superscripts, $x^{(k)}$ and such.\n",
    "\n",
    "- Explicit division is avoided.\n",
    "\n",
    "However, I use capital letters for vector-valued functions, for analogy to the use of capital letter for matrices.\n",
    "```"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Rewriting Newton's method according to this new style,\n",
    "\n",
    "$$x^{(k+1)} = x^{(k)} - \\frac{f(x^{(k)})}{Df(x^{(k)})}$$\n",
    "\n",
    "or to avoid explicit division and introducing the useful increment $\\delta^{(k)} := x^{(k+1)} - x^{(k)}$,\n",
    "\n",
    "$$Df(x^{(k)}) (\\delta^{(k)}) = f(x^{(k)}), \\quad x^{(k+1)} = x^{(k)} + \\delta^{(k)}.$$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Newton's method iteration formula for systems\n",
    "\n",
    "For vector valued functions, we will see in a while that an analogous result is true:\n",
    "\n",
    "$$\n",
    "(D \\textbf{F}(x^{(k)}) ({\\delta}^{(k)}) = \\textbf{F}(x^{(k)}),\n",
    "\\quad x^{(k+1)} = x^{(k)} + \\delta^{(k)}\n",
    "$$\n",
    "\n",
    "where $DF(x)$ is the $n \\times n$ matrix of all the partial derivatives\n",
    "$(D_{x_j} F_i)(x)$ or $(D_j F_i)(x)$, where $x = (x_1, x_2, \\dots, x_n).$"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Justification: linearization for function of several variables\n",
    "\n",
    "To justify the above result, we need at least a case of Taylor's Theorem for functions of several variables, for both\n",
    "$f:\\mathbb{R}^n \\to \\mathbb{R}$ and $F:\\mathbb{R}^n \\to \\mathbb{R}^n$; just for linear approximations.\n",
    "This material from multi-variable calculus will be reviewed when we need it."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**Warning:** although mathematically this can be written with matrix inverses as\n",
    "\n",
    "$$\n",
    "\\textbf{X}^{(k+1)} = \\textbf{X}^{(k)} - (D \\textbf{F}(\\textbf{X}^{(k)})^{-1}(\\textbf{F}(\\textbf{X}^{(k)}),\n",
    "$$\n",
    "evaluation of the inverse is in general about three times slower than solving the linear system, so is best avoided.\n",
    "(We have seen a good compromise; {ref}`section:linear-equations-lu-factorization` the LU factorization of a matrix.)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Even avoiding matrix inversion, this involves repeatedly solving systems of $n$ simultaneous linear equations in $n$ unknowns, $A x = b$, where the matrix $A$ is $D\\textbf{F}(x^{(k)})$, and that will be seen to involve about $n^3/3$ arithmetic operations.\n",
    "\n",
    "It also requires computing the new values of these $n^2$ partial derivatives at each iteration, also potentially with a cost proportional to $n^3$.\n",
    "\n",
    "When $n$ is large, as is common with differential equations problems, this factor of $n^3$ indicates a potentially very large cost per iteration, so various modifications have been developed to reduce the computational cost of each iteration (with the trade-off being that more iterations are typically needed): so-called *quasi-Newton methods.*"
   ]
  },
  {
   "cell_type": "raw",
   "metadata": {},
   "source": [
    "## Exercises"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Julia 1.11.4",
   "language": "julia",
   "name": "julia-1.11"
  },
  "language_info": {
   "file_extension": ".jl",
   "mimetype": "application/julia",
   "name": "julia",
   "version": "1.11.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
