conditional distribution of multi-variate normal variable


Let X be a random variableMathworldPlanetmath, taking values in Rn, normally distributed with a non-singular covariance matrixMathworldPlanetmath Ξ£ and a mean of zero.

Suppose Y is defined by Y=B*⁒X for some linear transformation B:Rkβ†’Rn of maximum rank. (βˆ— to denotes the transpose operator.)

Then the distributionPlanetmathPlanetmath of X conditioned on Y is multi-variate normal, with conditional means and covariancesMathworldPlanetmath of:


If k=1, so that B is simply a vector in Rn, these formulasMathworldPlanetmathPlanetmath reduce to:


If X does not have zero mean, then the formula for E[X∣Y] is modified by adding E⁒[X] and replacing Y by Y-E⁒[Y], and the formula for Var⁑[X∣Y] is unchanged.


We split up X into two stochastically independent parts, the first part containing exactly the information embodied in Y. Then the conditional distribution of X given Y is simply the unconditional distribution of the second part that is independentPlanetmathPlanetmath of Y.

To this end, we first change variables to express everything in terms of a standard multi-variate normal Z. Let A:ℝn→ℝn be a β€œsquare root” factorization of the covariance matrix Ξ£, so that:


We let H:ℝn→ℝn be the orthogonal projectionPlanetmathPlanetmath onto the range of Aβˆ—β’B:ℝk→ℝn, and decompose Z into orthogonalMathworldPlanetmathPlanetmathPlanetmath componentsPlanetmathPlanetmath:


It is intuitively obvious that orthogonality of the two random normal vectorsMathworldPlanetmath implies their stochastic independence. To show this formally, observe that the Gaussian density function for Z factors into a productPlanetmathPlanetmath:


We can construct an orthonormal system of coordinates on ℝn under which the components for H⁒z are completely disjoint from those components of (I-H)⁒z. On the other hand, the densities for Z, H⁒Z, and (I-H)⁒Z remain invariantMathworldPlanetmath even after changing coordinates, because they are radially symmetricMathworldPlanetmathPlanetmathPlanetmathPlanetmath. Hence the variables H⁒Z and (I-H)⁒Z are separable in their joint density and they are independent.

H⁒Z embodies the information in the linear combinationMathworldPlanetmath Y=Bβˆ—β’A⁒Z. For we have the identity:


The last term is null because (I-H)⁒Z is orthogonal to the range of Aβˆ—β’B by definition. (Equivalently, (I-H)⁒Z lies in the kernel of (Aβˆ—β’B)βˆ—=Bβˆ—β’A.) Thus Y can always be recovered by a linear transformation on H⁒Z.

Conversely, Y completely determines H⁒Z, from the analytical expression for H that we now give. In general, the orthogonal projection onto the range of an injectivePlanetmathPlanetmath transformationMathworldPlanetmath T is T⁒(Tβˆ—β’T)-1⁒Tβˆ—. Applying this to T=Aβˆ—β’B, we have

H =Aβˆ—β’B⁒(Bβˆ—β’A⁒Aβˆ—β’B)-1⁒Bβˆ—β’A

We see that H⁒Z=Aβˆ—β’B⁒(Bβˆ—β’Ξ£β’B)-1⁒Y.

We have proved that conditioning on Y and H⁒Z are equivalentMathworldPlanetmathPlanetmathPlanetmathPlanetmath, and so:



Var⁑[Z∣Y]=Var⁑[Z∣H⁒Z] =Var⁑[H⁒Z+(I-H)⁒Z∣H⁒Z]

using the defining property H2=H=Hβˆ— of orthogonal projections.

Now we express the result in terms of X, and remove the dependence on the transformation A (which is not uniquely defined from the covariance matrix):




Of course, the conditional distribution of X given Y is the same as that of (I-H)⁒Z, which is multi-variate normal.

The formula in the statement of this theorem, for the single-dimensional case, follows from substituting in Var⁑[Y]=Var⁑[Bβˆ—β’X]=Bβˆ—β’Ξ£β’B. The formula for when X does not have zero mean follows from applying the base case to the shifted variable X-𝔼⁒[X]. ∎

Title conditional distribution of multi-variate normal variable
Canonical name ConditionalDistributionOfMultivariateNormalVariable
Date of creation 2013-03-22 18:39:09
Last modified on 2013-03-22 18:39:09
Owner stevecheng (10074)
Last modified by stevecheng (10074)
Numerical id 5
Author stevecheng (10074)
Entry type Theorem
Classification msc 62E15
Classification msc 60E05