# approximating the birthday problem

A straightforward generalization of the birthday problem can be stated as:

Given $N$ possible outcomes, how many random samples must be taken so that with a probability of at least $1/2$ there are two equal samples?

Answer: Approximately $1.2\sqrt{N}$ samples must be taken.

So in the typical birthday problem setting the $N=365$ – the number of days in the typical year, and the result is that 23 people in a group gives at least a 50% chance of two people with a common birthday. This result can be obtained by a concise computation of the probilities for various sizes of the group until the proper value of 23 is found, but this approach does not lend itself to finding the value of 23 efficiently.

The result however has an easy to compute approximation of $1.2\sqrt{N}$, so in the case of $N=365$, the result is approximated by $1.2\sqrt{365}\approx 23$. We now validate the estimate.

Recall the probability of selecting a duplicate from $N$ possible outcomes within $k$ choices is

 $p(N,k)=\binom{N}{k}\frac{1}{N^{k}}=\left(1-\frac{1}{N}\right)\left(1-\frac{2}{% N}\right)\cdots\left(1-\frac{k-1}{N}\right).$

As

 $e^{x}=\lim_{m\rightarrow\infty}\left(1+\frac{x}{m}\right)^{m}$

we can roughly approximate the formal product above by:

 $p(N,k)\approx e^{-1/N-2/N-\cdots-k/N}=e^{-\binom{k+1}{2}\frac{1}{N}}.$

Now we are after a formula in which given $N$ and the desired probability of $1/2$ we can estimate $k$. So we solve for $k$ in the equation:

 $\frac{1}{2}=p(N,k)=e^{-\binom{k}{2}\frac{1}{N}}=e^{-\frac{k(k-1)}{2N}};$
 $-2N\ln\frac{1}{2}=k^{2}-k;$
 $\frac{1}{4}+2N\ln 2=k^{2}-k+\frac{1}{4}=\left(k-\frac{1}{2}\right)^{2};$
 $k=\frac{1}{2}+\sqrt{\frac{1}{4}+2N\ln 2}.$ (1)

A less precise but easier to compute approximation comes from simplifying our original estimate further:

 $p(N,k)\approx e^{-\frac{k(k-1)}{2N}}\approx e^{-k^{2}/2N}.$

Then set $\displaystyle 1/2=e^{-\frac{k(k-1)}{2N}}\approx e^{-k^{2}/2N}$ and once again solve for $k$ to find:

 $k=\sqrt{2N\ln 2}\approx 1.2\sqrt{N}.$

We also see that if we wish to have an outcome with probability $\varepsilon>0$ of no matches then we can estimate the sample size by:

 $\varepsilon=p(N,k)\approx e^{-k^{2}/2N};\qquad k\approx\sqrt{2N\ln\frac{1}{% \varepsilon}}.$

So the number of samples required to find a match with probability $\varepsilon>0$ is approximately

 $k=\sqrt{2N\frac{1}{1-\varepsilon}}.$

## References

Title approximating the birthday problem ApproximatingTheBirthdayProblem 2013-03-22 16:09:56 2013-03-22 16:09:56 Algeboy (12884) Algeboy (12884) 9 Algeboy (12884) Derivation msc 60C05 msc 05A10 msc 60-00