conditional distribution of multi-variate normal variable


Theorem.

Let X be a random variableMathworldPlanetmath, taking values in Rn, normally distributed with a non-singular covariance matrixMathworldPlanetmath Ξ£ and a mean of zero.

Suppose Y is defined by Y=B*⁒X for some linear transformation B:Rkβ†’Rn of maximum rank. (βˆ— to denotes the transpose operator.)

Then the distributionPlanetmathPlanetmath of X conditioned on Y is multi-variate normal, with conditional means and covariancesMathworldPlanetmath of:

𝔼[X∣Y]=Ξ£B(Bβˆ—Ξ£B)-1Y,Var[X∣Y]=Ξ£-Ξ£B(Bβˆ—Ξ£B)-1(Ξ£B)βˆ—.

If k=1, so that B is simply a vector in Rn, these formulasMathworldPlanetmathPlanetmath reduce to:

𝔼[X∣Y]=Σ⁒B⁒YVar⁑[Y],Var[X∣Y]=Ξ£-Σ⁒B⁒Bβˆ—β’Ξ£Var⁑[Y].

If X does not have zero mean, then the formula for E[X∣Y] is modified by adding E⁒[X] and replacing Y by Y-E⁒[Y], and the formula for Var⁑[X∣Y] is unchanged.

Proof.

We split up X into two stochastically independent parts, the first part containing exactly the information embodied in Y. Then the conditional distribution of X given Y is simply the unconditional distribution of the second part that is independentPlanetmathPlanetmath of Y.

To this end, we first change variables to express everything in terms of a standard multi-variate normal Z. Let A:ℝn→ℝn be a β€œsquare root” factorization of the covariance matrix Ξ£, so that:

A⁒Aβˆ—=Ξ£,Z=A-1⁒X,X=A⁒Z,Y=Bβˆ—β’A⁒Z.

We let H:ℝn→ℝn be the orthogonal projectionPlanetmathPlanetmath onto the range of Aβˆ—β’B:ℝk→ℝn, and decompose Z into orthogonalMathworldPlanetmathPlanetmathPlanetmath componentsPlanetmathPlanetmath:

Z=H⁒Z+(I-H)⁒Z.

It is intuitively obvious that orthogonality of the two random normal vectorsMathworldPlanetmath implies their stochastic independence. To show this formally, observe that the Gaussian density function for Z factors into a productPlanetmathPlanetmath:

(2⁒π)-n/2⁒exp⁑(-12⁒βˆ₯zβˆ₯2)=(2⁒π)-n/2⁒exp⁑(-12⁒βˆ₯H⁒zβˆ₯2)⁒exp⁑(-12⁒βˆ₯(I-H)⁒zβˆ₯2).

We can construct an orthonormal system of coordinates on ℝn under which the components for H⁒z are completely disjoint from those components of (I-H)⁒z. On the other hand, the densities for Z, H⁒Z, and (I-H)⁒Z remain invariantMathworldPlanetmath even after changing coordinates, because they are radially symmetricMathworldPlanetmathPlanetmathPlanetmathPlanetmath. Hence the variables H⁒Z and (I-H)⁒Z are separable in their joint density and they are independent.

H⁒Z embodies the information in the linear combinationMathworldPlanetmath Y=Bβˆ—β’A⁒Z. For we have the identity:

Y=(Bβˆ—β’A)⁒Z=(Bβˆ—β’A)⁒(H⁒Z+(I-H)⁒Z)=(Bβˆ—β’A)⁒H⁒Z+0.

The last term is null because (I-H)⁒Z is orthogonal to the range of Aβˆ—β’B by definition. (Equivalently, (I-H)⁒Z lies in the kernel of (Aβˆ—β’B)βˆ—=Bβˆ—β’A.) Thus Y can always be recovered by a linear transformation on H⁒Z.

Conversely, Y completely determines H⁒Z, from the analytical expression for H that we now give. In general, the orthogonal projection onto the range of an injectivePlanetmathPlanetmath transformationMathworldPlanetmath T is T⁒(Tβˆ—β’T)-1⁒Tβˆ—. Applying this to T=Aβˆ—β’B, we have

H =Aβˆ—β’B⁒(Bβˆ—β’A⁒Aβˆ—β’B)-1⁒Bβˆ—β’A
=Aβˆ—β’B⁒(Bβˆ—β’Ξ£β’B)-1⁒Bβˆ—β’A.

We see that H⁒Z=Aβˆ—β’B⁒(Bβˆ—β’Ξ£β’B)-1⁒Y.

We have proved that conditioning on Y and H⁒Z are equivalentMathworldPlanetmathPlanetmathPlanetmathPlanetmath, and so:

𝔼[Z∣Y]=𝔼[Z∣HZ]=𝔼[HZ+(I-H)Z∣HZ]=HZ+0,

and

Var⁑[Z∣Y]=Var⁑[Z∣H⁒Z] =Var⁑[H⁒Z+(I-H)⁒Z∣H⁒Z]
=0+Var⁑[(I-H)⁒Z]
=𝔼⁒[(I-H)⁒Z⁒Zβˆ—β’(I-H)βˆ—]
=(I-H)⁒(I-H)βˆ—
=I-H-Hβˆ—+H⁒Hβˆ—=I-H,

using the defining property H2=H=Hβˆ— of orthogonal projections.

Now we express the result in terms of X, and remove the dependence on the transformation A (which is not uniquely defined from the covariance matrix):

𝔼[X∣Y]=A𝔼[Z∣Y]=AHZ=Ξ£B(Bβˆ—Ξ£B)-1Y

and

Var⁑[X∣Y]=A⁒Var⁑[Z∣Y]⁒Aβˆ—=A⁒Aβˆ—-A⁒H⁒Aβˆ—=Ξ£-Σ⁒B⁒(Bβˆ—β’Ξ£β’B)-1⁒Bβˆ—β’Ξ£.

Of course, the conditional distribution of X given Y is the same as that of (I-H)⁒Z, which is multi-variate normal.

The formula in the statement of this theorem, for the single-dimensional case, follows from substituting in Var⁑[Y]=Var⁑[Bβˆ—β’X]=Bβˆ—β’Ξ£β’B. The formula for when X does not have zero mean follows from applying the base case to the shifted variable X-𝔼⁒[X]. ∎

Title conditional distribution of multi-variate normal variable
Canonical name ConditionalDistributionOfMultivariateNormalVariable
Date of creation 2013-03-22 18:39:09
Last modified on 2013-03-22 18:39:09
Owner stevecheng (10074)
Last modified by stevecheng (10074)
Numerical id 5
Author stevecheng (10074)
Entry type Theorem
Classification msc 62E15
Classification msc 60E05