sufficient statistic
Let {fθ} be a statistical model with parameter
θ. Let 𝑿=(X1,…,Xn) be a random vector
of random variables representing n observations. A statistic
T=T(𝑿) of 𝑿 for the parameter θ is called a
sufficient statistic, or a sufficient estimator, if
the conditional probability distribution of 𝑿 given
T(𝑿)=t is not a function of θ (equivalently,
does not depend on θ).
In other words, all the information about the unknown parameter θ is captured in the sufficient statistic T. If, say, we are interested in finding out the percentage of defective light bulbs in a shipment of new ones, it is enough, or sufficient, to count the number of defective ones (sum of the Xi’s), rather than worrying about which individual light bulbs are the defective ones (the vector (X1,…,Xn)). By taking the sum, a certain “reduction” of data has been achieved.
Examples
-
1.
Let X1,…,Xn be n independent
observations from a uniform distribution
on integers 1,…,θ. Let T=max{X1,…,Xn} be a statistic for θ. Then the conditional probability distribution of 𝑿=(X1,…,Xn) given T=t is
P(𝑿∣t)=P(X1=x1,…,Xn=xn,max{Xn}=t)P(max{Xn}=t). The numerator is 0 if max{xn}≠t. So in this case, P(𝑿∣t)=0 and is not a function of θ. Otherwise, the numerator is θ-n and P(𝑿∣t) becomes
θ-nP(max{Xn}=t)=(θnP(X(1)≤⋯≤X(n)=t))-1, where X(i)’s are the rearrangements of the Xi’s in a non-decreasing order from i=1 to n. For the denominator, we first note that
P(X(1)≤⋯≤X(n)=t) = P(X(1)≤⋯≤X(n)≤t)-P(X(1)≤⋯≤X(n)<t) = P(X(1)≤⋯≤X(n)≤t)-P(X(1)≤⋯≤X(n)≤t-1). From the above equation, we find that there are tn-(t-1)n ways to form non-decreasing finite sequences
of n positive integers such that the maximum of the sequence is t. So
(θnP(X(1)≤⋯≤X(n)=t))-1=(θn(tn-(t-1)n)θ-n)-1=(tn-(t-1)n)-1 again is not a function of θ. Therefore, T=max{Xi} is a sufficient statistic for θ. Here, we see that a reduction of data has been achieved by taking only the largest member of set of observations, not the entire set.
-
2.
If we set T(X1,…,Xn)=(X1,…,Xn), then we see that T is trivially a sufficient statistic for any parameter θ. The conditional probability distribution of (X1,…,Xn) given T is 1. Even though this is a sufficient statistic by definition (of course, the individual observations provide as much information there is to know about θ as possible), and there is no loss of data in T (which is simply a list of all observations), there is really no reduction of data to speak of here.
-
3.
The sample mean
ˉX=X1+⋯+Xnn of n independent observations from a normal distribution
N(μ,σ2) (both μ and σ2 unknown) is a sufficient statistic for μ. This is the result of the factorization criterion. Similarly, one sees that any partition
of the sum of n observations Xi into m subtotals is a sufficient statistic for μ. For instance,
T(X1,…,Xn)=(j∑i=1Xi,k∑i=j+1Xi,n∑i=k+1Xi) is a sufficient statistic for μ.
-
4.
Again, assume there are n independent observations Xi from a normal distribution N(μ,σ2) with unknown mean and variance
. The sample variance
1n-1n∑i=1(Xi-ˉX)2 is not a sufficient statistic for σ2. However, if μ is a known constant, then
1n-1n∑i=1(Xi-μ)2 is a sufficient statistic for σ2.
A sufficient statistic for a parameter θ is called a minimal sufficient statistic if it can be expressed as a function of any sufficient statistic for θ.
Example. In example 3 above, both the sample mean
ˉX and the finite sum S=X1+⋯+Xn are minimal
sufficient statistics for the mean μ. Since, by the
factorization criterion, any sufficient statistic T for μ is a
vector whose coordinates form a partition of the finite sum, taking
the sum of these coordinates is just the finite sum S. So, we
have just expressed S as a function of T. Therefore, S is
minimal. Similarly, ˉX is minimal.
Two sufficient statistics T1,T2 for a parameter θ are
said to be equivalent provided that there is a bijection
g such
that g∘T1=T2. ˉX and S from the above
example are two equivalent sufficient statistics. Two minimal sufficient statistics for the same parameter are equivalent.
Title | sufficient statistic |
Canonical name | SufficientStatistic |
Date of creation | 2013-03-22 15:02:42 |
Last modified on | 2013-03-22 15:02:42 |
Owner | CWoo (3771) |
Last modified by | CWoo (3771) |
Numerical id | 11 |
Author | CWoo (3771) |
Entry type | Definition |
Classification | msc 62B05 |
Synonym | sufficient estimator |
Synonym | minimally sufficient statistic |
Synonym | minimal sufficient |
Synonym | minimally sufficient |
Defines | minimal sufficient statistic |
Defines | equivalent statistic |