# Lagrange multipliers on manifolds

We discuss in this article the theoretical aspects of the Lagrange multiplier method.

To enhance understanding, proofs and intuitive explanations of the Lagrange multipler method will be given from several different viewpoints, both elementary and advanced.

## 1 Statements of theorem

Let $N$ be a $n$-dimensional differentiable manifold (without boundary), and $f:N\to \mathbb{R}$, and ${g}_{i}:N\to \mathbb{R}$, for $i=1,\mathrm{\dots},k$, be continuously differentiable. Set $M={\bigcap}_{i=1}^{k}{g}_{i}^{-1}(\{0\})$.

### 1.1 Formulation with differential forms

###### Theorem 1.

Suppose $d\mathit{}{g}_{i}$ are linearly independent^{}
at each point of $M$.
If $p\mathrm{\in}M$ is a local minimum^{} or maximum point of $f$ restricted to $M$,
then there exist Lagrange multipliers
${\lambda}_{\mathrm{1}}\mathrm{,}\mathrm{\dots}\mathrm{,}{\lambda}_{k}\mathrm{\in}\mathrm{R}$, depending on $p$,
such that

$$df(p)={\lambda}_{1}d{g}_{1}(p)+\mathrm{\cdots}+{\lambda}_{k}d{g}_{k}(p).$$ |

Here, $d$ denotes the exterior derivative^{}.

### 1.2 Formulation with gradients

The version of Lagrange multipliers typically
used in calculus is the special case
$N={\mathbb{R}}^{n}$ in Theorem 1.
In this case,
the conclusion^{} of the
theorem can also be written
in terms of gradients^{} instead of differential forms:

###### Theorem 2.

Suppose $\mathrm{\nabla}\mathit{}{g}_{i}$ are linearly independent at each point of $M$. If $p\mathrm{\in}M$ is a local minimum or maximum point of $f$ restricted to $M$, then there exist Lagrange multipliers ${\lambda}_{\mathrm{1}}\mathrm{,}\mathrm{\dots}\mathrm{,}{\lambda}_{k}\mathrm{\in}\mathrm{R}$, depending on $p$, such that

$$\nabla f(p)={\lambda}_{1}\nabla {g}_{1}(p)+\mathrm{\cdots}+{\lambda}_{k}\nabla {g}_{k}(p).$$ |

This formulation and the first one
are equivalent^{} since
the 1-form $df$ can be identified with the gradient
$\nabla f$, via the formula^{} $\nabla f(p)\cdot v=df(p;v)=d{f}_{p}(v)$.

### 1.3 Formulation with tangent maps

The functions ${g}_{i}$ can also be coalesced into a vector-valued function^{}
$g:N\to {\mathbb{R}}^{k}$. Then we have:

###### Theorem 3.

Let $g\mathrm{=}\mathrm{(}{g}_{\mathrm{1}}\mathrm{,}\mathrm{\dots}\mathrm{,}{g}_{k}\mathrm{)}\mathrm{:}N\mathrm{\to}{\mathrm{R}}^{k}$.
Suppose the tangent map $\mathrm{D}\mathit{}g$ is surjective^{}
at each point of $M$.
If $p\mathrm{\in}M$ is a local minimum or maximum point of $f$ restricted to $M$,
then
there exists a Lagrange multiplier vector $\lambda \mathrm{\in}{\mathrm{(}{\mathrm{R}}^{k}\mathrm{)}}^{\mathrm{*}}$,
depending on $p$, such that

$$\mathrm{D}f(p)=\mathrm{D}g{(p)}^{*}\lambda .$$ |

Here, $\mathrm{D}\mathit{}g\mathit{}\mathrm{(}p\mathrm{)}\mathrm{:}{\mathrm{(}{\mathrm{R}}^{k}\mathrm{)}}^{\mathrm{*}}\mathrm{\to}{\mathrm{(}{\mathrm{T}}_{p}\mathit{}N\mathrm{)}}^{\mathrm{*}}$ denotes the pullback of the linear transformation (http://planetmath.org/DualHomomorphism) $\mathrm{D}\mathit{}g\mathit{}\mathrm{(}p\mathrm{)}\mathrm{:}{\mathrm{T}}_{p}\mathit{}N\mathrm{\to}{\mathrm{R}}^{k}$.

If $\mathrm{D}g$ is represented by its Jacobian matrix, then the condition that it be surjective is equivalent to its Jacobian matrix having full rank.

Note the deliberate use of the space ${({\mathbb{R}}^{k})}^{*}$ instead of ${\mathbb{R}}^{k}$
— to which the former is isomorphic to —
for the Lagrange multiplier vector. It turns out that the
Lagrange multiplier vector naturally
lives in the dual space^{} and not the original vector space^{} ${\mathbb{R}}^{k}$.
This distinction is particularly important in the infinite-dimensional
generalizations^{} of Lagrange multipliers.
But even in the finite-dimensional setting,
we do see hints that the dual space
has to be involved, because a transpose^{} is involved
in the matrix expression for Lagrange multipliers.

If the expression $\mathrm{D}g{(p)}^{*}\lambda $ is written
out in coordinates^{}, then it is apparent that the components ${\lambda}_{i}$
of the vector $\lambda $ are exactly those
Lagrange multipliers from Theorems 1 and 2.

## 2 Proofs

The proof of the Lagrange multiplier theorem is surprisingly short and elegant,
when properly phrased in the language^{} of abstract manifolds and
differential forms.

However, for the benefit of the readers not versed in these topics,
we provide, in addition to the abstract proof, a concrete translation^{} of the arguments^{}
in the more familiar setting $N={\mathbb{R}}^{n}$.

### 2.1 Beautiful abstract proof

###### Proof.

Since $d{g}_{i}$ are linearly independent at each point of $M={\bigcap}_{i=1}^{k}{g}_{i}^{-1}(\{0\})$,
$M$ is an embedded submanifold of $N$,
of dimension^{} $m=n-k$. Let $\alpha :U\to M$, with $U$ open in ${\mathbb{R}}^{m}$, be
a coordinate chart for $M$ such that $\alpha (0)=p$.
Then ${\alpha}^{*}f$ has a local minimum or maximum at $0$,
and therefore $0=d({\alpha}^{*}f)={\alpha}^{*}df$ at $0$.
But ${\alpha}^{*}$ at $p$ is an isomorphism^{} ${\left({\mathrm{T}}_{p}M\right)}^{*}\to {\left({\mathrm{T}}_{0}{\mathbb{R}}^{m}\right)}^{*}$,
so the preceding equation says that $df$ vanishes on ${\mathrm{T}}_{p}M$.

Now, by the definition of ${g}_{i}$, we have ${\alpha}^{*}{g}_{i}=0$, so $0=d({\alpha}^{*}{g}_{i})={\alpha}^{*}d{g}_{i}$. So like $df$, $d{g}_{i}$ vanishes on ${\mathrm{T}}_{p}M$.

In other words, $d{g}_{i}(p)$ is in the annihilator^{} (http://planetmath.org/AnnihilatorOfVectorSubspace)
${\left({\mathrm{T}}_{p}M\right)}^{0}$ of the subspace^{} ${\mathrm{T}}_{p}M\subseteq {\mathrm{T}}_{p}N$.
Since ${\mathrm{T}}_{p}M$ has dimension $m=n-k$, and ${\mathrm{T}}_{p}N$ has dimension $n$,
the annihilator ${\left({\mathrm{T}}_{p}M\right)}^{0}$ has dimension $k$.
Now $d{g}_{i}(p)\in {\left({\mathrm{T}}_{p}M\right)}^{0}$ are linearly independent,
so they must in fact be a basis for ${\left({\mathrm{T}}_{p}M\right)}^{0}$.
But we had argued that $df(p)\in {\left({\mathrm{T}}_{p}M\right)}^{0}$.
Therefore $df(p)$ may be written as a unique linear combination^{}
of the $d{g}_{i}(p)$:

$$df(p)={\lambda}_{1}d{g}_{1}(p)+\mathrm{\cdots}+{\lambda}_{k}d{g}_{k}(p).\mathit{\u220e}$$ |

The last paragraph of the previous proof can also be rephrased, based on the same underlying ideas, to make evident the fact that the Lagrange multiplier vector lives in the dual space ${({\mathbb{R}}^{k})}^{*}$.

###### Alternative argument..

A general theorem in linear algebra states that for any linear transformation $L$, the image of the pullback ${L}^{*}$ is the annihilator of the kernel of $L$. Since $\mathrm{ker}\mathrm{D}g(p)={\mathrm{T}}_{p}M$ and $df(p)\in {({\mathrm{T}}_{p}M)}^{0}$, it immediately follows that $\lambda \in {({\mathbb{R}}^{k})}^{*}$ exists such that $df(p)=\mathrm{D}g{(p)}^{*}\lambda $. ∎

Yet another proof could be devised by observing that the result is obvious if $N={\mathbb{R}}^{n}$ and the constraint functions are just coordinate projections on ${\mathbb{R}}^{n}$:

$${g}_{i}({y}_{1},\mathrm{\dots},{y}_{n})={y}_{i},i=1,\mathrm{\dots},k.$$ |

We clearly must have $\partial f/\partial {y}_{k+1}=\mathrm{\cdots}=\partial f/\partial {y}_{n}=0$ at a point $p$ that minimizes $f(y)$ over ${y}_{1}=\mathrm{\cdots}={y}_{k}=0$. The general case can be deduced to this by a coordinate change:

###### Alternate argument..

Since $d{g}_{i}$ are linearly independent, we can find a coordinate chart for $N$ about the point $p$, with coordinate functions ${y}_{1},\mathrm{\dots},{y}_{n}:N\to \mathbb{R}$ such that ${y}_{i}={g}_{i}$ for $i=1,\mathrm{\dots},k$. Then

$df$ | $={\displaystyle \frac{\partial f}{\partial {y}_{1}}}d{y}_{1}+\mathrm{\cdots}+{\displaystyle \frac{\partial f}{\partial {y}_{n}}}d{y}_{n}$ | ||

$={\displaystyle \frac{\partial f}{\partial {g}_{1}}}d{g}_{1}+\mathrm{\cdots}+{\displaystyle \frac{\partial f}{\partial {g}_{k}}}d{g}_{k}+{\displaystyle \frac{\partial f}{\partial {y}_{k+1}}}d{y}_{k+1}+\mathrm{\cdots}+{\displaystyle \frac{\partial f}{\partial {y}_{n}}}d{y}_{n},$ |

but $\partial f/\partial {y}_{k+1}=\mathrm{\cdots}=\partial f/\partial {y}_{n}=0$ at the point $p$. Set ${\lambda}_{i}=\partial f/\partial {g}_{i}$ at $p$. ∎

### 2.2 Clumsy, but down-to-earth proof

###### Proof.

We assume that $N={\mathbb{R}}^{n}$.
Consider the list vector $g=({g}_{1},\mathrm{\dots},{g}_{k})$ discussed earlier,
and its Jacobian matrix $\mathrm{D}g$ in Euclidean^{} coordinates.
The $i$th row of this matrix
is

$$\left[\begin{array}{ccc}\hfill \frac{\partial {g}_{i}}{\partial {x}_{1}}\hfill & \hfill \mathrm{\dots}\hfill & \hfill \frac{\partial {g}_{i}}{\partial {x}_{n}}\hfill \end{array}\right]={\left(\nabla {g}_{i}\right)}^{\mathrm{T}}.$$ |

So the matrix $\mathrm{D}g$ has full rank (i.e. $\mathrm{rank}\mathrm{D}g=k$) if and only if the $k$ gradients $\nabla {g}_{i}$ are linearly independent.

Consider each solution $q\in M$ of $g(q)=0$.
Since $\mathrm{D}g$ has full rank, we can apply the implicit function theorem,
which states that there exist smooth solution parameterizations
$\alpha :U\to M$ around each point $q\in M$. ($U$ is an open set in ${\mathbb{R}}^{m}$, $m=n-k$.)
These $\alpha $ are the coordinate charts which give to $M={g}^{-1}(\{0\})$ a manifold structure^{}.

We now consider specially the point $q=p$; without loss of generality, assume $\alpha (0)=p$.
Then $f\circ \alpha $ is a function on Euclidean space^{} having a local minimum or maximum at $0$,
so its derivative^{} vanishes at $0$.
Calculating by the chain rule^{}, we have
$0=\mathrm{D}(f\circ \alpha )(0)=\mathrm{D}f(p)\cdot \mathrm{D}\alpha (0)$.
In other words, $\mathrm{ker}\mathrm{D}f(p)\supseteq \text{range of}\mathrm{D}\alpha (0)={\mathrm{T}}_{p}M$.
Intuitively, this says that the directional derivatives^{}
at $p$ of $f$ lying in the tangent space^{} ${\mathrm{T}}_{p}M$ of the manifold $M$ vanish.

By the definition of $g$ and $\alpha $, we have $g\circ \alpha =0$. By the chain rule again, we derive $0=\mathrm{D}g(p)\cdot \mathrm{D}\alpha (0)$.

Let the columns of $\mathrm{D}\alpha (0)$ be the column vectors ${v}_{1},\mathrm{\dots},{v}_{m}$, which span the $m$-dimensional space ${\mathrm{T}}_{p}M$, and look at the matrix equation $0=\mathrm{D}f(p)\cdot \mathrm{D}\alpha (0)$ again. The equation for each entry of this matrix, which consists of only one row, is:

$$\nabla f(p)\cdot {v}_{j}=0,j=1,\mathrm{\dots},m.$$ |

In other words, $\nabla f(p)$ is orthogonal^{} to ${v}_{1},\mathrm{\dots},{v}_{m}$,
and hence it is orthogonal to the entire tangent space ${\mathrm{T}}_{p}M$.

Similarly, the matrix equation $0=\mathrm{D}g(p)\cdot \mathrm{D}\alpha (0)$ can be split into individual scalar equations:

$$\nabla {g}_{i}(p)\cdot {v}_{j}=0,i=1,\mathrm{\dots},k,j=1,\mathrm{\dots},m.$$ |

Thus $\nabla {g}_{i}(p)$ is orthogonal to ${\mathrm{T}}_{p}M$.
But $\nabla {g}_{i}(p)$ are, by hypothesis^{}, linearly independent,
and there are $k$ of these gradients, so they must form a basis for
the orthogonal complement^{} of ${\mathrm{T}}_{p}M$, of $n-m=k$ dimensions.
Hence $\nabla f(p)$ can be written as a unique linear combination of $\nabla {g}_{i}(p)$:

$$\nabla f(p)={\lambda}_{1}\nabla {g}_{1}(p)+\mathrm{\cdots}+{\lambda}_{k}\nabla {g}_{k}(p).\mathit{\u220e}$$ |

## 3 Intuitive interpretations

We now discuss the intuitive and geometric
interpretations^{} of Lagrange multipliers.

### 3.1 Normals to tangent hyperplanes

Each equation ${g}_{i}=0$ defines a hypersurface ${M}_{i}$ in ${\mathbb{R}}^{n}$, a manifold of dimension $n-1$.
If we consider the tangent^{} hyperplane^{} at $p$ of these hypersurfaces, ${\mathrm{T}}_{p}{M}_{i}$, the gradient $\nabla {g}_{i}(p)$
gives the normal vector^{} to these hyperplanes.

The manifold $M$ is the intersection^{} of the hypersurfaces ${M}_{i}$.
Presumably, the tangent space ${\mathrm{T}}_{p}M$ is the intersection of the ${\mathrm{T}}_{p}{M}_{i}$, and the subspace perpendicular^{} to ${\mathrm{T}}_{p}M$ would be spanned by the normals $\nabla {g}_{i}(p)$.
Now, the direction derivatives at $p$ of $f$ with respect to each vector in ${\mathrm{T}}_{p}M$, as we have proved,
vanish. So the direction of $\nabla f(p)$, the direction
of the greatest change in $f$ at $p$, should be perpendicular
to ${\mathrm{T}}_{p}M$. Hence $\nabla f(p)$ can be written as a linear combination of the $\nabla {g}_{i}(p)$.

Note, however, that this geometric picture, and the manipulations with the gradients $\nabla f(p)$
and $\nabla {g}_{i}(p)$, do not carry over to abstract manifolds.
The notions of gradients and normals to surfaces depend on the
inner product^{} structure of ${\mathbb{R}}^{n}$, which is
not present in an abstract manifold (without a Riemannian metric^{}).

On the other hand, this explains the mysterious appearance of annihilators in the last paragraph of the abstract proof. Annihilators and dual space theory serve as the proper tools to formalize the manipulations we made with the matrix equations $0=\mathrm{D}f(p)\cdot \mathrm{D}\alpha (0)$ and $0=\mathrm{D}g(p)\cdot \mathrm{D}\alpha (0)$, without resorting to Euclidean coordinates, which, of course, are not even defined on an abstract manifold.

### 3.2 With infinitesimals

If we are willing to interpret the quantities $df$ and $d{g}_{i}$ as infinitesimals^{},
even the abstract version of the result has an intuitive explanation.
Suppose we are at the point $p$ of the manifold $M$,
and consider an infinitesimal movement $\mathrm{\Delta}p$ about this point.
The infinitesimal movement $\mathrm{\Delta}p$ is a vector in the tangent space
${\mathrm{T}}_{p}M$, because, near $p$, $M$ looks like the linear space ${\mathrm{T}}_{p}M$.
And as $p$ moves, the function $f$ changes by a corresponding infinitesimal amount $df$
that is approximately linear in $\mathrm{\Delta}p$.

Furthermore, the change $df$ may be decomposed
as the sum of a change as $p$ moves *along* the manifold $M$,
and a change as $p$ moves *out* of the manifold $M$.
But if $f$ has a local minimum at $p$, then there cannot be
any change of $f$ along $M$; thus $f$ only changes
when moving out of $M$.
Now $M$ is described by the equations ${g}_{i}=0$,
so a movement out of $M$ is described by the infinitesimal changes
$d{g}_{i}$.
As $df$ is linear in the change $\mathrm{\Delta}p$,
we ought to be able to write it as a weighted sum of the changes $d{g}_{i}$.
The weights are, of course, the Lagrange multipliers ${\lambda}_{i}$.

The linear algebra performed in the abstract proof can be regarded as the precise, rigorous translation of the preceding argument.

### 3.3 As rates of substitution

Observe that the formula for Lagrange multipliers is formally
very similar^{} to the standard formula for expressing
a differential form in terms of a basis:

$$df(p)=\frac{\partial f}{\partial {y}_{1}}d{y}_{1}+\mathrm{\cdots}+\frac{\partial f}{\partial {y}_{k}}d{y}_{k}.$$ |

In fact, if $d{g}_{i}(p)$ are linearly independent, then they do form a basis for ${({\mathrm{T}}_{p}M)}^{0}$, that can be extended to a basis for ${({\mathrm{T}}_{p}N)}^{*}$. By the uniqueness of the basis representation, we must have

$${\lambda}_{i}=\frac{\partial f}{\partial {g}_{i}}.$$ |

That is, ${\lambda}_{i}$ is the differential^{}
of $f$ with respect to changes in ${g}_{i}$.

In applications of Lagrange multipliers to economic
problems, the multipliers ${\lambda}_{i}$ are *rates of substitution* —
they give the rate of improvement in the objective function $f$
as the constraints ${g}_{i}$ are relaxed.

## 4 Stationary points

In applications, sometimes we are interested in finding stationary points $p$ of $f$ — defined as points $p$ such that $df$ vanishes on ${\mathrm{T}}_{p}M$, or equivalently, that the Taylor expansion of $f$ at $p$, under any system of coordinates for $M$, has no terms of first order. Then the Lagrange multiplier method works for this situation too.

The following theorem incorporates the more general notion of stationary points.

###### Theorem 4.

Let $N$ be a $n$-dimensional differentiable manifold (without boundary), and $f\mathrm{:}N\mathrm{\to}\mathrm{R}$, ${g}_{i}\mathrm{:}N\mathrm{\to}\mathrm{R}$, for $i\mathrm{=}\mathrm{1}\mathrm{,}\mathrm{\dots}\mathrm{,}k$, be continuously differentiable. Suppose $p\mathrm{\in}M\mathrm{=}{\mathrm{\bigcap}}_{i\mathrm{=}\mathrm{1}}^{k}{g}_{i}^{\mathrm{-}\mathrm{1}}\mathit{}\mathrm{(}\mathrm{\{}\mathrm{0}\mathrm{\}}\mathrm{)}$, and $d\mathit{}{g}_{i}\mathit{}\mathrm{(}p\mathrm{)}$ are linearly independent.

Then $p$ is a stationary point (e.g. a local extremum point) of $f$ restricted to $M$, if and only if there exist ${\lambda}_{\mathrm{1}}\mathrm{,}\mathrm{\dots}\mathrm{,}{\lambda}_{k}\mathrm{\in}\mathrm{R}$ such that

$$df(p)={\lambda}_{1}d{g}_{1}(p)+\mathrm{\cdots}+{\lambda}_{k}d{g}_{k}(p).$$ |

The Lagrange multipliers ${\lambda}_{i}$, which depend on $p$, are unique when they exist.

In this formulation, $M$ is not necessarily a manifold, but it is one when intersected with a sufficiently small neighborhood about $p$. So it makes sense to talk about ${\mathrm{T}}_{p}M$, although we are abusing notation here. The subspace in question can be more accurately described as the annihilated subspace of $\mathrm{span}\{d{g}_{i}(p)\}$.

It is also enough that $d{g}_{i}$ be linearly independent
only at the point $p$.
For $d{g}_{i}$ are continuous^{}, so they will be
linearly independent for points near
$p$ anyway,
and we may restrict our viewpoint to a sufficiently small neighborhood
around $p$, and the proofs carry through.

The proof involves only simple modifications to that of Theorem 1 — for instance, the converse implication follows because we have already proved that the $d{g}_{i}(p)$ form a basis for the annihilator of ${\mathrm{T}}_{p}M$, independently of whether or not $p$ is a stationary point of $f$ on $M$.

## References

- 1 Friedberg, Insel, Spence. Linear Algebra. Prentice-Hall, 1997.
- 2 David Luenberger. Optimization by Vector Space Methods. John Wiley & Sons, 1969.
- 3 James R. Munkres. Analysis on Manifolds. Westview Press, 1991.
- 4 R. Tyrrell Rockafellar. “Lagrange Multipliers and Optimality”. SIAM Review. Vol. 35, No. 2, June 1993.
- 5 Michael Spivak. Calculus on Manifolds. Perseus Books, 1998.

Title | Lagrange multipliers on manifolds |
---|---|

Canonical name | LagrangeMultipliersOnManifolds |

Date of creation | 2013-03-22 15:25:45 |

Last modified on | 2013-03-22 15:25:45 |

Owner | stevecheng (10074) |

Last modified by | stevecheng (10074) |

Numerical id | 24 |

Author | stevecheng (10074) |

Entry type | Topic |

Classification | msc 58C05 |

Classification | msc 49-00 |

Related topic | Manifold |