approximating the birthday problem


A straightforward generalizationPlanetmathPlanetmath of the birthday problemMathworldPlanetmath can be stated as:

Given N possible outcomes, how many random samples must be taken so that with a probability of at least 1/2 there are two equal samples?

Answer: Approximately 1.2N samples must be taken.

So in the typical birthday problem setting the N=365 – the number of days in the typical year, and the result is that 23 people in a group gives at least a 50% chance of two people with a common birthday. This result can be obtained by a concise computation of the probilities for various sizes of the group until the proper value of 23 is found, but this approach does not lend itself to finding the value of 23 efficiently.

The result however has an easy to compute approximation of 1.2N, so in the case of N=365, the result is approximated by 1.236523. We now validate the estimate.

Recall the probability of selecting a duplicate from N possible outcomes within k choices is

p(N,k)=(Nk)1Nk=(1-1N)(1-2N)(1-k-1N).

As

ex=limm(1+xm)m

we can roughly approximate the formal productPlanetmathPlanetmath above by:

p(N,k)e-1/N-2/N--k/N=e-(k+12)1N.

Now we are after a formulaMathworldPlanetmathPlanetmath in which given N and the desired probability of 1/2 we can estimate k. So we solve for k in the equation:

12=p(N,k)=e-(k2)1N=e-k(k-1)2N;
-2Nln12=k2-k;
14+2Nln2=k2-k+14=(k-12)2;
k=12+14+2Nln2. (1)

A less precise but easier to compute approximation comes from simplifying our original estimate further:

p(N,k)e-k(k-1)2Ne-k2/2N.

Then set 1/2=e-k(k-1)2Ne-k2/2N and once again solve for k to find:

k=2Nln21.2N.

We also see that if we wish to have an outcome with probability ε>0 of no matches then we can estimate the sample size by:

ε=p(N,k)e-k2/2N;k2Nln1ε.

So the number of samples required to find a match with probability ε>0 is approximately

k=2N11-ε.

References

Title approximating the birthday problem
Canonical name ApproximatingTheBirthdayProblem
Date of creation 2013-03-22 16:09:56
Last modified on 2013-03-22 16:09:56
Owner Algeboy (12884)
Last modified by Algeboy (12884)
Numerical id 9
Author Algeboy (12884)
Entry type Derivation
Classification msc 60C05
Classification msc 05A10
Classification msc 60-00