# calculus of variations

Imagine a bead of mass $m$ on a wire whose endpoints are at $a=(0,0)$ and $b=({x}_{f},{y}_{f})$, with ${y}_{f}$ lower than the starting position. If gravity acts on the bead with force $F=mg$, what path (arrangement of the wire) minimizes the bead’s travel time from $a$ to $b$, assuming no friction?

This is the famed *brachistochrone problem ^{}*, and its solution was one of the first accomplishments of the calculus of variations. Many minimum problems can be solved using the techniques introduced here.

In its general form, the calculus of variations concerns quantities

$$S[q,\dot{q},t]={\int}_{a}^{b}L(q(t),\dot{q}(t),t)\mathit{d}t$$ | (1) |

for which we wish to find a minimum or a maximum.

To make this concrete, let’s consider a much simpler problem than the brachistochrone: what’s the shortest distance between two points $p=(x1,y1)$ and $q=(x2,y2)$? Let the variable $s$ represent distance along the path, so that ${\int}_{p}^{q}\mathit{d}s=S$. We wish to find the path such that $S$ is a minimum. Zooming in on a small portion of the path, we can see that

$d{s}^{2}$ | $=d{x}^{2}+d{y}^{2}$ | (2) | ||

$ds$ | $=\sqrt{d{x}^{2}+d{y}^{2}}$ | (3) |

If we parameterize the path by $t$, then we have

$$ds=\sqrt{{\left(\frac{dx}{dt}\right)}^{2}+{\left(\frac{dy}{dt}\right)}^{2}}dt$$ | (4) |

Let’s assume $y=f(x)$, so that we may simplify (4) to

$$ds=\sqrt{1+{\left(\frac{dy}{dx}\right)}^{2}}dx=\sqrt{1+{f}^{\prime}{(x)}^{2}}dx.$$ | (5) |

Now we have

$$S={\int}_{p}^{q}L\mathit{d}x={\int}_{x1}^{x2}\sqrt{1+{f}^{\prime}{(x)}^{2}}\mathit{d}x$$ | (6) |

In this case, $L$ is particularly simple. Converting to $q$’s and $t$’s to make the comparison easier, we have $L=L[{f}^{\prime}(x)]=L[\dot{q}(t)]$, not the more general $L[q(t),\dot{q}(t),t]$ covered by the calculus of variations. We’ll see later how to use our $L$’s simplicity to our advantage. For now, let’s talk more generally.

We wish to find the path described by $L$, passing through a point $q(a)$ at $t=a$ and through $q(b)$ at $t=b$, for which the quantity $S$ is a minimum, for which in the path produce no first-order change in $S$, which we’ll call a “stationary point.” This is directly analogous to the idea that for a function $f(t)$, the minimum can be found where $\delta t$ produce no first-order change in $f(t)$. This is where $f(t+\delta t)\approx f(t)$; taking a Taylor series^{} expansion of $f(t)$ at $t$, we find

$$f(t+\delta t)=f(t)+\delta t{f}^{\prime}(t)+O(\delta {t}^{2})=f(t),$$ | (7) |

with ${f}^{\prime}(t):=\frac{d}{dt}f(t)$. Of course, since the whole point is to consider $\delta t\ne 0$, once we neglect terms $O(\delta {t}^{2})$ this is just the point where ${f}^{\prime}(t)=0$. This point, call it $t={t}_{0}$, could be a minimum or a maximum, so in the usual calculus of a single variable we’d proceed by taking the second derivative, ${f}^{\prime \prime}({t}_{0})$, and seeing if it’s positive or negative to see whether the function has a minimum or a maximum at ${t}_{0}$, respectively.

In the calculus of variations, we’re not considering in $t$—we’re considering in the *integral* of the relatively complicated *function* $L(q,\dot{q},t)$, where $\dot{q}=\frac{d}{dt}q(t)$. Also, $S$ is a functional^{}, and we can think of the minimization^{} problem as the discovery of a minimum in $S$-space as we jiggle the parameters $q$ and $\dot{q}$.

For the shortest-distance problem, it’s clear the maximum time doesn’t exist, since for any finite path length ${S}_{0}$ we (intuitively) can always find a curve for which the path’s length is greater than ${S}_{0}$. This is often true, and we’ll assume for this discussion that finding a stationary point means we’ve found a minimum.

Formally, we write the condition that produce no change in $S$ as $\delta S=0$. To make this precise, we simply write

$\delta S$ | $:=S[q+\delta q,\dot{q}+\delta \dot{q},t]-S[q,\dot{q},t]$ | ||

$={\displaystyle {\int}_{a}^{b}}L(q+\delta q,\dot{q}+\delta \dot{q})\mathit{d}t-S[q,\dot{q},t]$ |

How are we to simplify this mess? We are considering to the path, which suggests a Taylor series expansion of $L(q+\delta q,\dot{q}+\delta \dot{q})$ about $(q,\dot{q})$:

$$L(q+\delta q,\dot{q}+\delta \dot{q})=L(q,\dot{q})+\delta q\frac{\partial}{\partial q}L(q,\dot{q})+\delta \dot{q}\frac{\partial}{\partial \dot{q}}L(q,\dot{q})+O(\delta {q}^{2})+O(\delta {\dot{q}}^{2})$$ |

and since we make little error by discarding higher-order terms in $\delta q$ and $\delta \dot{q}$, we have

$${\int}_{a}^{b}L(q+\delta q,\dot{q}+\delta \dot{q})\mathit{d}t=S[q,\dot{q},t]+{\int}_{a}^{b}\delta q\frac{\partial}{\partial q}L(q,\dot{q})+\delta \dot{q}\frac{\partial}{\partial \dot{q}}L(q,\dot{q})dt$$ |

Keeping in mind that $\delta \dot{q}=\frac{d}{dt}\delta q$ and noting that

$\frac{d}{dt}}\left(\delta q{\displaystyle \frac{\partial}{\partial \dot{q}}}L(q,\dot{q})\right)$ | $=\delta q{\displaystyle \frac{d}{dt}}{\displaystyle \frac{\partial}{\partial \dot{q}}}L(q,\dot{q})+\delta \dot{q}{\displaystyle \frac{\partial}{\partial \dot{q}}}L(q,\dot{q}),$ |

a simple application of the product rule^{} $\frac{d}{dt}(fg)=\dot{f}g+f\dot{g}$ which allows us to substitute

$\delta \dot{q}{\displaystyle \frac{\partial}{\partial \dot{q}}}L(q,\dot{q})$ | $={\displaystyle \frac{d}{dt}}\left(\delta q{\displaystyle \frac{\partial}{\partial \dot{q}}}L(q,\dot{q})\right)-\delta q{\displaystyle \frac{d}{dt}}{\displaystyle \frac{\partial}{\partial \dot{q}}}L(q,\dot{q}),$ |

we can rewrite the integral, shortening $L(q,\dot{q})$ to $L$ for convenience, as:

${\int}_{a}^{b}}\delta q{\displaystyle \frac{\partial}{\partial q}}L+\delta \dot{q}{\displaystyle \frac{\partial}{\partial \dot{q}}}Ldt$ | $={\displaystyle {\int}_{a}^{b}}\delta q{\displaystyle \frac{\partial}{\partial q}}L-\delta q{\displaystyle \frac{d}{dt}}{\displaystyle \frac{\partial}{\partial \dot{q}}}L+{\displaystyle \frac{d}{dt}}\left(\delta q{\displaystyle \frac{\partial}{\partial \dot{q}}}L\right)dt$ | ||

$={\displaystyle {\int}_{a}^{b}}\delta q\left[{\displaystyle \frac{\partial}{\partial q}}L-{\displaystyle \frac{d}{dt}}{\displaystyle \frac{\partial}{\partial \dot{q}}}L\right]\mathit{d}t+{\delta q{\displaystyle \frac{\partial}{\partial \dot{q}}}L|}_{a}^{b}$ |

Substituting all of this progressively back into our original expression for $\delta S$, we obtain

$\delta S$ | $={\displaystyle {\int}_{a}^{b}}L(q+\delta q,\dot{q}+\delta \dot{q})\mathit{d}t-S[q,\dot{q},t]$ | ||

$=S+{\displaystyle {\int}_{a}^{b}}\left[\delta q{\displaystyle \frac{\partial}{\partial q}}L+\delta \dot{q}{\displaystyle \frac{\partial}{\partial \dot{q}}}L\right]\mathit{d}t-S$ | |||

$={\displaystyle {\int}_{a}^{b}}\delta q\left[{\displaystyle \frac{\partial}{\partial q}}L-{\displaystyle \frac{d}{dt}}{\displaystyle \frac{\partial}{\partial \dot{q}}}L\right]\mathit{d}t+{\delta q{\displaystyle \frac{\partial}{\partial \dot{q}}}L|}_{a}^{b}=0.$ |

Two conditions come to our aid. First, we’re only interested in the neighboring paths that still begin at $a$ and end at $b$, which corresponds to the condition $\delta q=0$ at $a$ and $b$, which lets us cancel the final term. Second, between those two points, we’re interested in the paths which *do* vary, for which $\delta q\ne 0$.
This leads us to the condition

$${\int}_{a}^{b}\delta q\left[\frac{\partial}{\partial q}L-\frac{d}{dt}\frac{\partial}{\partial \dot{q}}L\right]\mathit{d}t=0.$$ | (8) |

The fundamental theorem of the calculus of variations is that for continuous functions^{} $f(t),g(t)$ with $g(t)\ne 0\forall t\in (a,b)$,

$${\int}_{a}^{b}f(t)g(t)\mathit{d}t=0\mathit{\hspace{1em}}\u27f9\mathit{\hspace{1em}}f(t)=0\forall t\in (a,b).$$ | (9) |

Using this theorem, we obtain

$$\frac{\partial}{\partial q}L-\frac{d}{dt}\left(\frac{\partial}{\partial \dot{q}}L\right)=0.$$ | (10) |

This condition, one of the fundamental equations of the calculus of variations, is called the *Euler–Lagrange condition*. When presented with a problem in the calculus of variations, the first thing one usually does is to ask why one simply doesn’t plug the problem’s $L$ into this equation and solve.

Recall our shortest-path problem, where we had arrived at

$$S={\int}_{a}^{b}L\mathit{d}x={\int}_{x1}^{x2}\sqrt{1+{f}^{\prime}{(x)}^{2}}\mathit{d}x.$$ | (11) |

Here, $x$ takes the place of $t$, $f$ takes the place of $q$, and (8) becomes

$$\frac{\partial}{\partial f}L-\frac{d}{dx}\frac{\partial}{\partial {f}^{\prime}}L=0$$ | (12) |

Even with $\frac{\partial}{\partial f}L=0$, this is still ugly. However, because $\frac{\partial}{\partial f}L=0$, we can use the Beltrami identity^{},

$$L-{q}^{\prime}\frac{\partial}{\partial {q}^{\prime}}L=C.$$ | (13) |

(For the derivation^{} of this useful little trick, see the corresponding entry.) Now we must simply solve

$$\sqrt{1+{f}^{\prime}{(x)}^{2}}-{f}^{\prime}(x)\frac{\partial}{\partial {f}^{\prime}}L=C$$ | (14) |

which looks just as daunting, but quickly reduces to

$\sqrt{1+{f}^{\prime}{(x)}^{2}}-{f}^{\prime}(x){\displaystyle \frac{\frac{1}{2}2{f}^{\prime}(x)}{\sqrt{1+{f}^{\prime}{(x)}^{2}}}}$ | $=C$ | (15) | ||

$\frac{1+{f}^{\prime}{(x)}^{2}-{f}^{\prime}{(x)}^{2}}{\sqrt{1+{f}^{\prime}{(x)}^{2}}}$ | $=C$ | (16) | ||

$\frac{1}{\sqrt{1+{f}^{\prime}{(x)}^{2}}}$ | $=C$ | (17) | ||

${f}^{\prime}(x)$ | $=\sqrt{{\displaystyle \frac{1}{{C}^{2}}}-1}=m.$ | (18) |

That is, the slope of the curve representing the shortest path between two points is a constant, which means the searched curve, i.e. the extremal of this variational problem, must be a straight line. Through this lengthy process, we’ve proved that a straight line is the shortest distance between two points.

To find the actual function $f(x)$ given endpoints $({x}_{1},{y}_{1})$ and $({x}_{2},{y}_{2})$, simply integrate with respect to $x$:

$$f(x)=\int {f}^{\prime}(x)\mathit{d}x=\int b\mathit{d}x=mx+d$$ | (19) |

and then apply the boundary conditions^{}

$f({x}_{1})$ | $={y}_{1}=m{x}_{1}+d$ | (20) | ||

$f({x}_{2})$ | $={y}_{2}=m{x}_{2}+d$ | (21) |

Subtracting the first condition from the second, we get $m=\frac{{y}_{2}-{y}_{1}}{{x}_{2}-{x}_{1}}$, the standard equation for the slope of a line. Solving for $d={y}_{1}-m{x}_{1}$, we get

$$f(x)=\frac{{y}_{2}-{y}_{1}}{{x}_{2}-{x}_{1}}(x-{x}_{1})+{y}_{1}$$ | (22) |

which is the basic equation for a line passing through $({x}_{1},{y}_{1})$ and $({x}_{2},{y}_{2})$.

The solution to the brachistochrone problem, while slightly more complicated, follows along exactly the same lines.

Title | calculus of variations |

Canonical name | CalculusOfVariations |

Date of creation | 2013-03-22 12:20:48 |

Last modified on | 2013-03-22 12:20:48 |

Owner | rspuzio (6075) |

Last modified by | rspuzio (6075) |

Numerical id | 16 |

Author | rspuzio (6075) |

Entry type | Topic |

Classification | msc 49K05 |

Classification | msc 47A60 |

Related topic | TaylorSeries |

Related topic | LinearFunctional |

Related topic | BeltramiIdentity |

Related topic | EulerLagrangeDifferentialEquation |

Related topic | TheoremForLocallyIntegrableFunctions |

Related topic | Extremal |

Related topic | EquationOfCatenaryViaCalculusOfVariations |

Related topic | LeastSurfaceOfRevolution |

Related topic | BrachistochroneCurve |

Related topic | SpeediestInclinedPlane |

Defines | brachistochrone problem |