Newton’s method works for convex real functions

Theorem 1.

Let $f\colon I\to\mathbb{R}$ be a convex differentiable function on an interval $I\subseteq\mathbb{R}$ , with at least one root. Then the following sequence $\{x_{n}\}$ obtained from Newton’s method,

$x_{n+1}=x_{n}-\frac{f(x_{n})}{f^{\prime}(x_{n})}\,,$

will converge to a root of $f$ , provided that $f^{\prime}(x_{0})\neq 0$ and $x_{1}\in I$ for the given starting point $x_{0}\in I$ .

Obviously, “convex” can be replaced by “concave” in Theorem 1.

The proof will proceed in several steps. First, a simple converse result:

Theorem 2.

Let $f\colon I\to\mathbb{R}$ be a differentiable convex function, and $\{x_{n}\}$ be the sequence in Theorem 1. If it is convergent to a number $a\in I$ , then $a$ is necessarily a root of $f$ .

Proof.

We have

$0=\lim_{n\to\infty}f^{\prime}(x_{n})\cdot(x_{n}-x_{n+1})=\lim_{n\to\infty}f(x_% {n})=f(a)\,.$

(The first limit is zero by local boundedness of $f^{\prime}$ at $a$ , which follows from $f^{\prime}$ being finite and monotone¹¹ Actually, a differentiable convex function must necessarily have a continuous derivative, for the derivative is increasing, and it cannot have any jump discontinuities by Darboux’s Theorem (http://planetmath.org/DarbouxsTheorem)..) ∎

Proof of Theorem 1

Case A: when $I=\mathbb{R}$ , $f^{\prime}>0$ , and $f(x_{0})\geq 0$

We claim that whenever $f(x_{n})\geq 0$ , then $f(x_{n+1})\geq 0$ too. Recall that the graph of a convex function $f$ always lies above any of its tangent lines. So in particular, the point $(x_{n+1},f(x_{n+1}))$ is higher than the point $(x_{n+1},0)$ , which is on the tangent line by definition. By induction, we have $f(x_{n})\geq 0$ for all $n$ .

We also have $x_{n+1}\leq x_{n}$ for all $n$ . By hypothesis the slope $f^{\prime}(x_{n})$ of the tangent line is positive, and $(x_{n},f(x_{n}))$ lies above the horizontal axis. Then the intersection point of the tangent line and the horizontal axis, which is $(x_{n+1},0)$ , must be to the left of $(x_{n},0)$ .

So the sequence $\{x_{n}\}$ is decreasing. It converges to a real number or diverges to $-\infty$ . If the limit is a real number, by Theorem 2 that number is a root of $f$ , and we are done.

If the limit is $-\infty$ , then we must have $f(x_{n})>0$ for all $x_{n}$ . Hence $f(x)>0$ for all $x$ , as $f$ is monotone and $x_{n}$ can be arbitrarily large negative. In other words, $f$ has no root to begin with.

Figure 1: Case A: $\{x_{n}\}$ is monotonically decreasing

Case B: when $I=\mathbb{R}$ , $f^{\prime}<0$ , and $f(x_{0})\geq 0$

This situation is the (left-right) mirror of Case A, except that because the slope of the curve is negative, the sequence $\{x_{n}\}$ is increasing this time. We still have $x_{n}\geq 0$ , so the sequence either converges to a root of $f$ , or diverges to $+\infty$ (in which case $f$ has no root).

Figure 2: Case B: $\{x_{n}\}$ is monotonically increasing

Case C: when $I=\mathbb{R}$ , $f^{\prime}(x_{0})>0$ , and $f(x_{0})\geq 0$

To begin, note that the first part of Case A, asserting that $f(x_{n})\geq 0$ for all $n$ , still goes through only assuming $f^{\prime}(x_{n})\neq 0$ — in which case the sequence would not be defined after the point $x_{n}$ . We show that, in fact, $f^{\prime}(x_{n})>0$ for all $n$ , so the rest of Case A goes through.

Suppose $n$ is the smallest integer for which $f^{\prime}(x_{n+1})\leq 0$ . Also assume $f(x_{n}),f(x_{n+1})\neq 0$ , otherwise we would be done anyway. The reasoning in Case A shows $x_{n+1}<x_{n}$ .

Figure 3: Case C: a tangent line separates $f$ from the horizontal axis

The function $f$ is strictly positive on $[x_{n+1},x_{n}]$ , for the graph of $f$ lies above the tangent line segment extending from $(x_{n+1},0)$ to $(x_{n},f(x_{n}))$ , which in turn lies strictly above the horizontal axis. Since $f$ is decreasing to the left of $[x_{n+1},x_{n}]$ and increasing to the right, $f$ never touches the horizontal axis anywhere. This contradicts our hypothesis that $f$ has roots.

Case D: when $I=\mathbb{R}$ , $f^{\prime}(x_{0})<0$ , and $f(x_{0})\geq 0$

This situation is the (left-right) mirror of Case C. The same argument shows that $f^{\prime}(x_{n})<0$ for all $n$ (or $f(x_{n})=0$ for some $n$ ), and the argument of Case B applies.

Case E: for general intervals $I$

The only concern is that the iteration $\{x_{n}\}$ might go outside the interval $I$ . We show that it does not.

Suppose $f(x_{0})\geq 0$ . Then as we have proved, $f(x_{n})\geq 0$ for all $n$ . Suppose $x_{n}\in I$ but $x_{n+1}\notin I$ . Then $f$ has no root, because the graph of $f$ lies above the tangent line from $(x_{n},f(x_{n}))$ to $(x_{n+1},0)$ which lies above the horizontal axis.

The limit of $x_{n}$ could conceivably go just outside the interval $I$ . But this case is similar to the case $x_{n}\to\pm\infty$ already considered in Case A: $f$ possesses no root then.

Finally, suppose $f(x_{0})<0$ , the case we have ignored all along. Our hypotheses assume that $f(x_{1})$ is defined. We immediately have $f(x_{1})\geq 0$ , since the graph of $f$ lies above the tangent line from $(x_{0},f(x_{0}))$ to $(x_{1},0)$ . But this just brings us back to the other cases.

Figure 4: Case E: $f(x_{1})$ is always $\geq 0$

Examples

Newton’s method for finding square roots of positive numbers $\alpha$ (which reduces to the ancient divide-and-average method) always converges, for the target function $f(x)=x^{2}-\alpha$ is convex.

References

1 Michael Spivak. Calculus, third edition. Publish or Perish, 1994.

Title	Newton’s method works for convex real functions
Canonical name	NewtonsMethodWorksForConvexRealFunctions
Date of creation	2013-03-22 15:41:28
Last modified on	2013-03-22 15:41:28
Owner	stevecheng (10074)
Last modified by	stevecheng (10074)
Numerical id	7
Author	stevecheng (10074)
Entry type	Theorem
Classification	msc 49M15
Classification	msc 65H05
Classification	msc 26A06
Synonym	Newton’s method works for concave real functions