tests for local extrema in Lagrange multiplier method

Let $U$ be open in $\mathbb{R}^{n}$ , and $f\colon U\to\mathbb{R}$ , $g\colon U\to\mathbb{R}^{m}$ be twice continuously differentiable functions. Assume that $p\in U$ is a stationary point for $f$ on $M=g^{-1}(\{0\})$ , and $\operatorname{D}g$ has full rank everywhere¹¹Actually, only $\operatorname{D}g(p)$ needs to have full rank, and the arguments presented here continue to hold in that case, although $M$ would not necessarily be a manifold then. on $M$ . Then we know that $p$ is the solution to the Lagrange multiplier system

\operatorname{D}f(p)=\lambda\cdot\operatorname{D}g(p)\,,

(1)

for a Lagrange multiplier vector $\lambda=(\lambda_{1},\ldots,\lambda_{m})$ .

Our aim is to develop an analogue of the second derivative test for the stationary point $p$ .

The most straightforward way to proceed is to consider a coordinate chart $\alpha\colon V\to M$ for the manifold $M$ , and consider the Hessian of the function $f\circ\alpha\colon V\to\mathbb{R}^{n}$ at $0=\alpha^{-1}(p)$ . This Hessian is in fact just the Hessian form of $f\colon M\to\mathbb{R}$ expressed in the coordinates of the chart $\alpha$ . But the whole point of using Lagrange multipliers is to avoid calculating coordinate charts directly, so we find an equivalent expression for $\operatorname{D}^{2}(f\circ\alpha)(0)$ in terms of $\operatorname{D}^{2}f(p)$ without mentioning derivatives of $\alpha$ .

To do this, we differentiate $f\circ\alpha$ twice using the chain rule and product rule²²Note that the “product” operation involved (second equality of (2)) is the operation of composition of two linear mappings. Think hard about this if you are not sure; it took me several tries to get this formula right, since multi-variable iterated derivatives have a complicated structure.. To reduce clutter, from now on we use the prime notation for derivatives rather than $\operatorname{D}$ .

\begin{split}\displaystyle(f\circ\alpha)^{\prime\prime}(0)&\displaystyle=((f^{% \prime}\circ\alpha)\cdot\alpha^{\prime})^{\prime}(0)\\ &\displaystyle=\bigl{(}(f^{\prime\prime}\circ\alpha)\cdot\alpha^{\prime}\cdot% \alpha^{\prime}+(f^{\prime}\circ\alpha)\cdot\alpha^{\prime\prime}\bigr{)}(0)\\ &\displaystyle=f^{\prime\prime}(\alpha(0))\cdot\alpha^{\prime}(0)\cdot\alpha^{% \prime}(0)+f^{\prime}(\alpha(0))\cdot\alpha^{\prime\prime}(0)\\ &\displaystyle=f^{\prime\prime}(p)\cdot\alpha^{\prime}(0)\cdot\alpha^{\prime}(% 0)+f^{\prime}(p)\cdot\alpha^{\prime\prime}(0)\,.\end{split}

(2)

If we interpret $(f\circ\alpha)^{\prime\prime}(0)$ as a bilinear mapping of vectors $u,v\in\mathbb{R}^{n-m}$ , then formula (2) really means

(f\circ\alpha)^{\prime\prime}(0)\cdot(u,v)=f^{\prime\prime}(p)\cdot\bigl{(}% \alpha^{\prime}(0)\cdot u,\alpha^{\prime}(0)\cdot v\bigr{)}+f^{\prime}(p)\cdot% \bigl{(}\alpha^{\prime\prime}(0)\cdot(u,v)\bigr{)}\,.

(3)

To obtain the quadratic form, we set $v=u$ ; also we abbreviate the vector $\alpha^{\prime}(0)\cdot u$ by $h$ , which belongs to the tangent space $\mathrm{T}_{p}M$ of $M$ at $p$ . So,

(f\circ\alpha)^{\prime\prime}(0)\cdot u^{2}=f^{\prime\prime}(p)\cdot h^{2}+f^{% \prime}(p)\cdot\bigl{(}\alpha^{\prime\prime}(0)\cdot u^{2}\bigr{)}\,.

(4)

Naïvely, we might think that $(f\circ\alpha)^{\prime\prime}(0)$ is simply $f^{\prime\prime}(p)$ restricted to the tangent space $\mathrm{T}_{p}M$ . This happens to be the first term in (4), but there is also an additional contribution by the second term involving $\alpha^{\prime\prime}(0)$ ; intuitively, $\alpha^{\prime\prime}(0)$ is the curvature of the surface (manifold) $M$ , “changing the geometry” of the graph of $f$ .

But the second term of (4) still involves $\alpha$ . To eliminate it, we differentiate the equation $g\circ\alpha=0$ twice.

0=(g\circ\alpha)^{\prime\prime}(0)=g^{\prime\prime}(p)\cdot\alpha^{\prime}(0)% \cdot\alpha^{\prime}(0)+g^{\prime}(p)\cdot\alpha^{\prime\prime}(0)\,.

(5)

(It is derived the same way as (2) but with $f$ replaced by $g$ .) Now we can substitute (5) and (1) in (2) to eliminate the term $f^{\prime}(p)\cdot\alpha^{\prime\prime}(0)$ :

\begin{split}\displaystyle(f\circ\alpha)^{\prime\prime}(0)&\displaystyle=f^{% \prime\prime}(p)\cdot\alpha^{\prime}(0)\cdot\alpha^{\prime}(0)+\lambda\cdot g^% {\prime}(p)\cdot\alpha^{\prime\prime}(0)\\ &\displaystyle=f^{\prime\prime}(p)\cdot\alpha^{\prime}(0)\cdot\alpha^{\prime}(% 0)-\lambda\cdot g^{\prime\prime}(p)\cdot\alpha^{\prime}(0)\cdot\alpha^{\prime}% (0)\,,\end{split}

(6)

or expressed as a quadratic form,

(f\circ\alpha)^{\prime\prime}(0)\cdot u^{2}=f^{\prime\prime}(p)\cdot h^{2}-% \lambda\cdot g^{\prime\prime}(p)\cdot h^{2}\,.

(7)

Thus, to understand the nature of the stationary point $p$ , we can study the modified Hessian:

f^{\prime\prime}(p)-\lambda\cdot g^{\prime\prime}(p)\,,\quad\text{ restricted % to $\mathrm{T}_{p}M$. }

(8)

For example, if this bilinear form is positive definite, then $p$ is a local minimum, and if it is negative definite, then $p$ is a local maximum, and so on. All the tests that apply to the usual Hessian in $\mathbb{R}^{n}$ apply to the modified Hessian (8).

In coordinates of $\mathbb{R}^{n}$ , the modified Hessian (8) takes the form

\sum_{i=1}^{n}\sum_{j=1}^{n}\left(\left.\frac{\partial^{2}f}{\partial x^{i}% \partial x^{j}}\right|_{p}-\sum_{k=1}^{m}\lambda_{k}\,\left.\frac{\partial^{2}% g^{k}}{\partial x^{i}\partial x^{j}}\right|_{p}\right)h^{i}h^{j}\,.

(9)

We emphasize that the vector $h$ can be restricted to lie in the tangent space $\mathrm{T}_{p}M$ , when studying the stationary point $p$ of $f$ restricted to $M$ .

In matrix form (9) can be written

B=\begin{bmatrix}\dfrac{\partial^{2}f}{\partial x^{i}\partial x^{j}}-\sum% \limits_{k=1}^{m}\lambda_{k}\,\dfrac{\partial^{2}g^{k}}{\partial x^{i}\partial x% ^{j}}\end{bmatrix}_{ij}\,.

(10)

But again, the test vector $h$ need only lie on $\mathrm{T}_{p}M$ , so if we want to apply positive/negative definiteness tests for matrices, they should instead be applied to the projected or reduced Hessian:

Z^{\mathrm{t}}BZ

(11)

where the columns of the $n\times(n-m)$ matrix $Z$ form a basis for $\mathrm{T}_{p}M=\ker g^{\prime}(p)\subset\mathbb{R}^{n}$ .

Title	tests for local extrema in Lagrange multiplier method
Canonical name	TestsForLocalExtremaInLagrangeMultiplierMethod
Date of creation	2013-03-22 15:28:52
Last modified on	2013-03-22 15:28:52
Owner	stevecheng (10074)
Last modified by	stevecheng (10074)
Numerical id	6
Author	stevecheng (10074)
Entry type	Result
Classification	msc 26B12
Classification	msc 49-00
Classification	msc 49K35
Related topic	HessianForm
Related topic	RelationsBetweenHessianMatrixAndLocalExtrema
Defines	projected Hessian
Defines	reduced Hessian