# perspective drawing

## 1 Introduction

Perspective drawing refers to a particular technique of projecting a three-dimensional scene — mathemetically, some subset of $\mathbb{R}^{3}$ — onto a subset of $\mathbb{R}^{2}$. The aim is to make the scene look “natural”.

The most common model for perspective drawing consists of an ideal eye or focus at a point $\mathbf{e}\in\mathbb{R}^{3}$, located behind a planar screen that is perpendicular to the unit vector $\mathbf{g}\in\mathbb{R}^{3}$, representing the direction of gaze.

To project a point $\mathbf{p}\in\mathbb{R}^{3}$ in front of the screen onto a point $\mathbf{q}\in\mathbb{R}^{3}$ on the screen, we find the intersection, with the screen, of the light ray emitted from the point $\mathbf{p}$ towards the eye $\mathbf{e}$:

## 2 Basic equation for the projection

Let $\mathbf{o}\in\mathbb{R}^{3}$ be a point on the screen. Without loss of generality, assume that $\mathbf{e}-\mathbf{o}$ is anti-parallel to $\mathbf{g}$, as in the previous picture. (We can always move the point $\mathbf{o}$ on the screen.) This means that $\mathbf{o}$ will be the origin of the perspective drawing.

To calculate the projected point $\mathbf{q}$ explicitly, we can solve these equations for $t\in[0,1]$:

 $\displaystyle\mathbf{q}$ $\displaystyle=(1-t)\mathbf{p}+t\mathbf{e}$ ($\mathbf{q}$ is on the ray from $\mathbf{p}$ to $\mathbf{e}$) $\displaystyle 0$ $\displaystyle=\langle\mathbf{q}-\mathbf{o},\mathbf{g}\rangle$ ($\mathbf{q}$ is on plane centered at $\mathbf{o}$ perpendicular to $\mathbf{g}$)

(The notation $\langle,\rangle$ denotes the dot product in $\mathbb{R}^{3}$.) The solution is readily found to be:

 $t=\frac{\langle\mathbf{p}-\mathbf{o},\mathbf{g}\rangle}{\langle\mathbf{p}-% \mathbf{e},\mathbf{g}\rangle}=\frac{\text{horiz. distance of \mathbf{p} to % the screen}}{\text{horiz. distance of \mathbf{p} to the eye}}\,.$

This $t$ satisfies $0\leq t<1$, because the horizontal distance of the point $\mathbf{p}$ to the screen is less than its horizontal distance to the eye.

From the expression for $t$, we can determine $\mathbf{q}$:

 $\mathbf{q}=\frac{\langle\mathbf{p}-\mathbf{o},\mathbf{g}\rangle\mathbf{e}-% \langle\mathbf{e}-\mathbf{o},\mathbf{g}\rangle\mathbf{p}}{\langle\mathbf{p}-% \mathbf{e},\mathbf{g}\rangle}\,.$ (1)

## 3 Isomorphism of the planar screen to $\mathbb{R}^{2}$

Of course, we are interested in displaying or drawing the projected point $\mathbf{q}$ on a flat screen, so we need to assign coordinates to the screen. In mathematical terms, we seek an (affine) isomorphism to $\mathbb{R}^{2}$, of the plane centered at $\mathbf{o}$ and perpendicular to $\mathbf{g}$.

The isomorphism is, obviously, not unique in general. But it is uniquely determined if we impose these reasonable conditions:

• Firstly, Euclidean distances on the screen should be preserved when it is transformed into $\mathbb{R}^{2}$.

• And secondly, if we assign an “up direction” $\mathbf{v}\in\mathbb{R}^{3}$, then this direction should correspond with the unit vector $(0,1)$ on $\mathbb{R}^{2}$.

• Similarly, if $\mathbf{u}\in\mathbb{R}^{3}$ is a “right”-pointing vector, satisfying $\mathbf{u}\times\mathbf{v}=-\mathbf{g}$, then $\mathbf{u}$ should map to $(1,0)$ on $\mathbb{R}^{2}$. (The minus sign is there to preserve the right-hand rule: the vector $\mathbf{u}\times\mathbf{v}$ should point towards the viewer, which is exactly the opposite of the viewer’s gaze direction.)

Let $\gamma$ be the right-handed orthonormal basis for $\mathbb{R}^{3}$ consisting of the vectors $\mathbf{u}$, $\mathbf{v}$ and $-\mathbf{g}$, where $\mathbf{g}$ is the gaze direction as before and $\mathbf{v}$ is a given “up” vector. The vector $\mathbf{u}$ can be determined by

 $\mathbf{u}=\mathbf{v}\times-\mathbf{g}=\mathbf{g}\times\mathbf{v}\,.$

Then any point $\mathbf{q}$ on the planar screen is to be mapped to the point $\mathbf{q}^{\prime}\in\mathbb{R}^{2}$, where

 $(\mathbf{q}^{\prime},0)=[\mathbf{q}-\mathbf{o}]_{\gamma}\,,$ (2)

and $[\mathbf{q}-\mathbf{o}]_{\gamma}$ is the representation of $(\mathbf{q}-\mathbf{o})$ in the basis $\gamma$.

## 4 The perspective transform in eye coordinates

The perspective projection (1) takes a simple form in eye coordinates (coordinates with respect to the basis $\gamma$). We derive it now.

Let $\mathbf{p}_{o}=\mathbf{p}-\mathbf{o}$, and $\mathbf{e}_{o}=\mathbf{e}-\mathbf{o}$. Then equation (1) can be rewritten:

 $\displaystyle\mathbf{q}$ $\displaystyle=\frac{-\langle\mathbf{p}_{o},-\mathbf{g}\rangle\mathbf{e}+% \langle\mathbf{e}_{o},-\mathbf{g}\rangle\mathbf{p}}{\langle\mathbf{e}_{o}-% \mathbf{p}_{o},-\mathbf{g}\rangle}\,,$ $\displaystyle\mathbf{q}-\mathbf{o}$ $\displaystyle=\frac{-\langle\mathbf{p}_{o},-\mathbf{g}\rangle\mathbf{e}_{o}+% \langle\mathbf{e}_{o},-\mathbf{g}\rangle\mathbf{p}_{o}}{\langle\mathbf{e}_{o}-% \mathbf{p}_{o},-\mathbf{g}\rangle}\,.$

We can write out the last vector equation in $\gamma$ coordinates. Let $[\mathbf{p}_{o}]_{\gamma}=(x,y,z)$ and $[\mathbf{e}_{o}]_{\gamma}=(0,0,a)$ for some $a>0$ (the distance of the eye from the screen). Then

 $\displaystyle[\mathbf{q}-\mathbf{o}]_{\gamma}$ $\displaystyle=\frac{-z(0,0,a)+a(x,y,z)}{a-z}=\frac{(ax,ay,0)}{a-z}$ $\displaystyle=\left(\frac{x}{1-z/a},\,\frac{y}{1-z/a},\,0\right)\,.$

Thus, according to equation (2), the perspective transform of $\mathbf{p}$ onto $\mathbb{R}^{2}$ is given by

 $\mathbf{q}^{\prime}=\left(\frac{x}{1-z/a},\,\frac{y}{1-z/a}\right)\,.$ (3)

## 5 Computing the perspective transform in arbitrary coordinates

Finally, we want explicit expressions for the perspective transform in terms of coordinates of an arbitrarily given orthonormal basis $\beta$ for $\mathbb{R}^{3}$. (For example, $\beta$ might be the standard basis.)

The matrix that changes from $\gamma$ coordinates to $\beta$ coordinates this matrix composed of three column vectors:

 $[I]^{\beta}_{\gamma}=\begin{pmatrix}[\mathbf{u}]_{\beta}&[\mathbf{v}]_{\beta}&% [-\mathbf{g}]_{\beta}\end{pmatrix}\,.$

The matrix $[I]^{\gamma}_{\beta}$ that changes from $\beta$ coordinates to $\gamma$ coordinates is the inverse to $[I]_{\gamma}^{\beta}$; but both matrices are orthogonal, so the inverse simplifies to the transpose:

 $[I]_{\beta}^{\gamma}=\Bigl{(}[I]^{\beta}_{\gamma}\Bigr{)}^{-1}=\Bigl{(}[I]^{% \beta}_{\gamma}\Bigr{)}^{\mathrm{t}}=\begin{pmatrix}[\mathbf{u}]_{\beta}^{% \mathrm{t}}\\ [\mathbf{v}]_{\beta}^{\mathrm{t}}\\ [-\mathbf{g}]_{\beta}^{\mathrm{t}}\end{pmatrix}\,.$

Therefore, to obtain the perspective transform of a given point $\mathbf{p}$ given in $\beta$ coordinates, we are to compute the matrix product

 $\begin{pmatrix}x\\ y\\ z\end{pmatrix}=[I]^{\gamma}_{\beta}\cdot[\mathbf{p}-\mathbf{o}]_{\beta}\,,$ (4)

and apply formula (3) right afterwards.

## 6 Example

This figure was created using the MetaPost programming language. MetaPost by itself has no facilities to produce three-dimensional graphics, so the \PMlinktofilesource code for this drawingcube.mp implements formulae (3) and (4) directly.

## 7 The perspective transform in homogeneous coordinates

The operation described by formula (3) is not linear in $\mathbf{p}$ or $\mathbf{p}-\mathbf{o}$, so it cannot be represented by a $3\times 3$ matrix. But using homogeneous coordinates, the perspective transform can be represented by the following $4\times 4$ matrix (with respect to eye coordinates):

 $\displaystyle\begin{pmatrix}1&0&0&0\\ 0&1&0&0\\ 0&0&0&0\\ 0&0&-a^{-1}&1\end{pmatrix}\text{ or }\begin{pmatrix}a&0&0&0\\ 0&a&0&0\\ 0&0&0&0\\ 0&0&-1&a\end{pmatrix}\,.$

## 8 Properties of the perspective transform

(To be written. Talk about the fact that lines are mapped to lines by the perspective transform, and vanishing points.)

## 9 Other models for three-dimensional drawing

(To be written. Talk about orthographic projection (and relate it to the special case where $a\to\infty$). Could also mention the pin-hole camera model, or even more complicated models with non-infinitesimal lens.)

Title perspective drawing PerspectiveDrawing 2013-03-22 15:41:10 2013-03-22 15:41:10 stevecheng (10074) stevecheng (10074) 12 stevecheng (10074) Topic msc 51N20 msc 15A90