« Towards a new programming languages course: ideas welcome! » Eastman maximal comma-free codes in Haskell

Any clues about this Newton iteration formula with Jacobian matrix?

Posted on June 20, 2016
Tagged , , , , , ,

A while ago I wrote about using Boltzmann sampling to generate random instances of algebraic data types, and mentioned that I have some code I inherited for doing the core computations. There is one part of the code that I still don’t understand, having to do with a variant of Newton’s method for finding a fixed point of a mutually recursive system of equations. It seems to work, but I don’t like using code I don’t understand—for example, I’d like to be sure I understand the conditions under which it does work, to be sure I am not misusing it. I’m posting this in the hopes that someone reading this may have an idea.

Let Φ:RnRn\Phi : \mathbb{R}^n \to \mathbb{R}^n be a vector function, defined elementwise in terms of functions Φ1,,Φn:RnR\Phi_1, \dots, \Phi_n : \mathbb{R}^n \to \mathbb{R}:

Φ(X)=(Φ1(X),,Φn(X))\displaystyle \Phi(\mathbf{X}) = (\Phi_1(\mathbf{X}), \dots, \Phi_n(\mathbf{X}))

where X=(X1,,Xn)\mathbf{X} = (X_1, \dots, X_n) is a vector in Rn\mathbb{R}^n. We want to find the fixed point Y\mathbf{Y} such that Φ(Y)=Y\Phi(\mathbf{Y}) = \mathbf{Y}.

The algorithm (you can see the code here) now works as follows. First, define J\mathbf{J} as the n×nn \times n Jacobian matrix of partial derivatives of the Φi\Phi_i, that is,

J=[X1Φ1XnΦ1X1ΦnXnΦn]\displaystyle \displaystyle \mathbf{J} = \begin{bmatrix} \frac{\partial}{\partial X_1} \Phi_1 & \dots & \frac{\partial}{\partial X_n} \Phi_1 \\ \vdots & \ddots & \vdots \\ \frac{\partial}{\partial X_1} \Phi_n & \dots & \frac{\partial}{\partial X_n} \Phi_n\end{bmatrix}

Now let Y0=(0,,0)\mathbf{Y}_0 = (0, \dots, 0) and let U0=In\mathbf{U}_0 = I_n be the n×nn \times n identity matrix. Then for each i0i \geq 0 define

Ui+1=Ui+Ui(J(Yi)Ui(UiIn))\displaystyle \mathbf{U}_{i+1} = \mathbf{U}_i + \mathbf{U}_i(\mathbf{J}(\mathbf{Y}_i)\mathbf{U}_i - (\mathbf{U}_i - I_n))

and also

Yi+1=Yi+Ui+1(Φ(Yi)Yi).\displaystyle \mathbf{Y}_{i+1} = \mathbf{Y}_i + \mathbf{U}_{i+1}(\Phi(\mathbf{Y}_i) - \mathbf{Y}_i).

Somehow, magically (under appropriate conditions on Φ\Phi, I presume), the sequence of Yi\mathbf{Y}_i converge to the fixed point Y\mathbf{Y}. But I don’t understand where this is coming from, especially the equation for Ui+1\mathbf{U}_{i+1}. Most generalizations of Newton’s method that I can find seem to involve multiplying by the inverse of the Jacobian matrix. So what’s going on here? Any ideas/pointers to the literature/etc?