conditional distribution of multi-variate normal variable
Theorem.
Let X be a random variable, taking values in Rn, normally distributed
with a non-singular covariance matrix
Σ and a mean of zero.
Suppose Y is defined by Y=B*X for some linear transformation B:Rk→Rn of maximum rank. (∗ to denotes the transpose operator.)
Then the distribution of X conditioned on Y is multi-variate normal,
with conditional means and covariances
of:
𝔼[X∣Y]=ΣB(B∗ΣB)-1Y,Var[X∣Y]=Σ-ΣB(B∗ΣB)-1(ΣB)∗. |
If k=1, so that B is simply a vector in Rn,
these formulas reduce to:
𝔼[X∣Y]=ΣBYVar[Y],Var[X∣Y]=Σ-ΣBB∗ΣVar[Y]. |
If X does not have zero mean, then the formula for E[X∣Y] is modified by adding E[X] and replacing Y by Y-E[Y], and the formula for Var[X∣Y] is unchanged.
Proof.
We split up X into two stochastically independent parts,
the first part containing exactly the information embodied in Y.
Then the conditional distribution of X given Y is simply
the unconditional distribution of the second part that is independent of Y.
To this end, we first change variables to express everything in terms of a standard multi-variate normal Z. Let A:ℝn→ℝn be a “square root” factorization of the covariance matrix Σ, so that:
AA∗=Σ,Z=A-1X,X=AZ,Y=B∗AZ. |
We let H:ℝn→ℝn be the orthogonal projection onto the range of
A∗B:ℝk→ℝn, and decompose Z into orthogonal
components
:
Z=HZ+(I-H)Z. |
It is intuitively obvious that orthogonality
of the two random normal vectors implies their stochastic independence.
To show this formally, observe that the Gaussian density function for Z
factors into a product
:
(2π)-n/2exp(-12∥z∥2)=(2π)-n/2exp(-12∥Hz∥2)exp(-12∥(I-H)z∥2). |
We can construct an orthonormal system of coordinates on ℝn
under which the components for Hz are
completely disjoint from those components of (I-H)z.
On the other hand, the densities for Z, HZ, and (I-H)Z
remain invariant even after changing coordinates,
because they are radially symmetric
.
Hence the variables HZ and (I-H)Z are separable in their joint density
and they are independent.
HZ embodies the information in the linear combination Y=B∗AZ.
For we have the identity:
Y=(B∗A)Z=(B∗A)(HZ+(I-H)Z)=(B∗A)HZ+0. |
The last term is null because (I-H)Z is orthogonal to the range of A∗B by definition. (Equivalently, (I-H)Z lies in the kernel of (A∗B)∗=B∗A.) Thus Y can always be recovered by a linear transformation on HZ.
Conversely, Y completely determines HZ,
from the analytical expression for H that we now give.
In general, the orthogonal projection onto the range of an injective
transformation
T
is T(T∗T)-1T∗. Applying this to T=A∗B, we have
H | =A∗B(B∗AA∗B)-1B∗A | ||
=A∗B(B∗ΣB)-1B∗A. |
We see that HZ=A∗B(B∗ΣB)-1Y.
We have proved that conditioning on Y and HZ are equivalent, and so:
𝔼[Z∣Y]=𝔼[Z∣HZ]=𝔼[HZ+(I-H)Z∣HZ]=HZ+0, |
and
Var[Z∣Y]=Var[Z∣HZ] | =Var[HZ+(I-H)Z∣HZ] | ||
=0+Var[(I-H)Z] | |||
=𝔼[(I-H)ZZ∗(I-H)∗] | |||
=(I-H)(I-H)∗ | |||
=I-H-H∗+HH∗=I-H, |
using the defining property H2=H=H∗ of orthogonal projections.
Now we express the result in terms of X, and remove the dependence on the transformation A (which is not uniquely defined from the covariance matrix):
𝔼[X∣Y]=A𝔼[Z∣Y]=AHZ=ΣB(B∗ΣB)-1Y |
and
Var[X∣Y]=AVar[Z∣Y]A∗=AA∗-AHA∗=Σ-ΣB(B∗ΣB)-1B∗Σ. |
Of course, the conditional distribution of X given Y is the same as that of (I-H)Z, which is multi-variate normal.
Title | conditional distribution of multi-variate normal variable |
---|---|
Canonical name | ConditionalDistributionOfMultivariateNormalVariable |
Date of creation | 2013-03-22 18:39:09 |
Last modified on | 2013-03-22 18:39:09 |
Owner | stevecheng (10074) |
Last modified by | stevecheng (10074) |
Numerical id | 5 |
Author | stevecheng (10074) |
Entry type | Theorem |
Classification | msc 62E15 |
Classification | msc 60E05 |