# conditional distribution of multi-variate normal variable

###### Theorem.

Let $X$ be a random variable, taking values in $\mathbb{R}^{n}$, normally distributed with a non-singular covariance matrix $\Sigma$ and a mean of zero.

Suppose $Y$ is defined by $Y=B^{*}X$ for some linear transformation $B\colon\mathbb{R}^{k}\to\mathbb{R}^{n}$ of maximum rank. (${}^{\ast}$ to denotes the transpose operator.)

Then the distribution of $X$ conditioned on $Y$ is multi-variate normal, with conditional means and covariances of:

 $\mathbb{E}[X\mid Y]=\Sigma B{(B^{\ast}\Sigma B)}^{-1}Y\,,\quad\operatorname{% Var}[X\mid Y]=\Sigma-\Sigma B{(B^{\ast}\Sigma B)}^{-1}(\Sigma B)^{\ast}\,.$

If $k=1$, so that $B$ is simply a vector in $\mathbb{R}^{n}$, these formulas reduce to:

 $\mathbb{E}[X\mid Y]=\frac{\Sigma BY}{\operatorname{Var}[Y]}\,,\quad% \operatorname{Var}[X\mid Y]=\Sigma-\frac{\Sigma BB^{\ast}\Sigma}{\operatorname% {Var}[Y]}\,.$

If $X$ does not have zero mean, then the formula for $\mathbb{E}[X\mid Y]$ is modified by adding $\mathbb{E}[X]$ and replacing $Y$ by $Y-\mathbb{E}[Y]$, and the formula for $\operatorname{Var}[X\mid Y]$ is unchanged.

###### Proof.

We split up $X$ into two stochastically independent parts, the first part containing exactly the information embodied in $Y$. Then the conditional distribution of $X$ given $Y$ is simply the unconditional distribution of the second part that is independent of $Y$.

To this end, we first change variables to express everything in terms of a standard multi-variate normal $Z$. Let $A\colon\mathbb{R}^{n}\to\mathbb{R}^{n}$ be a “square root” factorization of the covariance matrix $\Sigma$, so that:

 $AA^{\ast}=\Sigma\,,\quad Z={A}^{-1}X\,,\quad X=AZ\,,\quad Y=B^{\ast}AZ\,.$

We let $H\colon\mathbb{R}^{n}\to\mathbb{R}^{n}$ be the orthogonal projection onto the range of $A^{\ast}B:\mathbb{R}^{k}\to\mathbb{R}^{n}$, and decompose $Z$ into orthogonal components:

 $Z=HZ+(I-H)Z\,.$

It is intuitively obvious that orthogonality of the two random normal vectors implies their stochastic independence. To show this formally, observe that the Gaussian density function for $Z$ factors into a product:

 $(2\pi)^{-n/2}\,\exp\bigl{(}-\tfrac{1}{2}\lVert z\rVert^{2}\bigr{)}=(2\pi)^{-n/% 2}\,\exp\bigl{(}-\tfrac{1}{2}\lVert Hz\rVert^{2}\bigr{)}\,\exp\bigl{(}-\tfrac{% 1}{2}\lVert(I-H)z\rVert^{2}\bigr{)}\,.$

We can construct an orthonormal system of coordinates on $\mathbb{R}^{n}$ under which the components for $Hz$ are completely disjoint from those components of $(I-H)z$. On the other hand, the densities for $Z$, $HZ$, and $(I-H)Z$ remain invariant even after changing coordinates, because they are radially symmetric. Hence the variables $HZ$ and $(I-H)Z$ are separable in their joint density and they are independent.

$HZ$ embodies the information in the linear combination $Y=B^{\ast}AZ$. For we have the identity:

 $Y=\bigl{(}B^{\ast}A\bigr{)}Z=\bigl{(}B^{\ast}A\bigr{)}\bigl{(}HZ+(I-H)Z\bigr{)% }=\bigl{(}B^{\ast}A\bigr{)}HZ+0\,.$

The last term is null because $(I-H)Z$ is orthogonal to the range of $A^{\ast}B$ by definition. (Equivalently, $(I-H)Z$ lies in the kernel of $(A^{\ast}B)^{\ast}=B^{\ast}A$.) Thus $Y$ can always be recovered by a linear transformation on $HZ$.

Conversely, $Y$ completely determines $HZ$, from the analytical expression for $H$ that we now give. In general, the orthogonal projection onto the range of an injective transformation $T$ is $T{(T^{\ast}T)}^{-1}T^{\ast}$. Applying this to $T=A^{\ast}B$, we have

 $\displaystyle H$ $\displaystyle=A^{\ast}B{\bigl{(}B^{\ast}AA^{\ast}B\bigr{)}}^{-1}B^{\ast}A$ $\displaystyle=A^{\ast}B{(B^{\ast}\Sigma B)}^{-1}B^{\ast}A\,.$

We see that $HZ=A^{\ast}B{(B^{\ast}\Sigma B)}^{-1}Y$.

We have proved that conditioning on $Y$ and $HZ$ are equivalent, and so:

 $\mathbb{E}[Z\mid Y]=\mathbb{E}[Z\mid HZ]=\mathbb{E}[HZ+(I-H)Z\mid HZ]=HZ+0\,,$

and

 $\displaystyle\operatorname{Var}[Z\mid Y]=\operatorname{Var}[Z\mid HZ]$ $\displaystyle=\operatorname{Var}[HZ+(I-H)Z\mid HZ]$ $\displaystyle=0+\operatorname{Var}[(I-H)Z]$ $\displaystyle=\mathbb{E}\bigl{[}(I-H)ZZ^{\ast}(I-H)^{\ast}\bigr{]}$ $\displaystyle=(I-H)(I-H)^{\ast}$ $\displaystyle=I-H-H^{\ast}+HH^{\ast}=I-H\,,$

using the defining property $H^{2}=H=H^{\ast}$ of orthogonal projections.

Now we express the result in terms of $X$, and remove the dependence on the transformation $A$ (which is not uniquely defined from the covariance matrix):

 $\mathbb{E}[X\mid Y]=A\,\mathbb{E}[Z\mid Y]=AHZ=\Sigma B{(B^{\ast}\Sigma B)}^{-% 1}Y$

and

 $\operatorname{Var}[X\mid Y]=A\,\operatorname{Var}[Z\mid Y]\,A^{\ast}=AA^{\ast}% -AHA^{\ast}=\Sigma-\Sigma B{(B^{\ast}\Sigma B)}^{-1}B^{\ast}\Sigma\,.$

Of course, the conditional distribution of $X$ given $Y$ is the same as that of $(I-H)Z$, which is multi-variate normal.

The formula in the statement of this theorem, for the single-dimensional case, follows from substituting in $\operatorname{Var}[Y]=\operatorname{Var}[B^{\ast}X]=B^{\ast}\Sigma B$. The formula for when $X$ does not have zero mean follows from applying the base case to the shifted variable $X-\mathbb{E}[X]$. ∎

Title conditional distribution of multi-variate normal variable ConditionalDistributionOfMultivariateNormalVariable 2013-03-22 18:39:09 2013-03-22 18:39:09 stevecheng (10074) stevecheng (10074) 5 stevecheng (10074) Theorem msc 62E15 msc 60E05