In the wardrobe drawer there are 24 socks. Half are black, the other half is white.

If you take two socks at random (let’s say, closing your eyes), what is the probability that you get a matching pair?

29 November 2017

In the wardrobe drawer there are 24 socks. Half are black, the other half is white.

If you take two socks at random (let’s say, closing your eyes), what is the probability that you get a matching pair?

If we have \(n\) random variables \(x_i\), all **independent** and all with the **same distribution**, then their average \[A_n=\frac{1}{n}\sum_i^n x_i\] converges to the expected value \(\mathbb E X\). The speed of convergence is \(\sqrt{n}.\) \[\Pr\left((A_n-\mathbb EX)^2\geq c\frac{\sqrt{\mathbb VX}}{\sqrt{n}}\right)\leq \frac{1}{c^2}\]

Using the notation \([Q]=1\) if the question \(Q\) is true and \(0\) if it is false, we have \[\mathbb E[Q]=\Pr(Q)\] Therefore if we do \(n\) experiments and \(N(Q)\) of them are positive for the question \(Q,\) then \[\frac{N(Q)}{n}\underset{n\rightarrow\infty}\longrightarrow \Pr(Q)\]

So far we have worked with random systems where the outcomes are discrete

- Only two outcomes, like in a coin
- A finite number of outcomes, such as DNA
- A natural number, such as the number of successful experiments

- A single “coin”, or any experiment with only two possible outcomes
- \(\Pr(\text{success}) = p\)
- \(\Pr(\text{failure}) = q = 1-p\)

- We can encode it as 0 or 1 using \([\text{success}]\)
- The expected value of \([\text{success}]\) is \[\mathbb E[\text{success}]=p\]
- The variance of \([\text{success}]\) is \[\mathbb V[\text{success}]=pq\]

We throw \(n\) “coins”, all independent, all with the same probability of success \(p\)

The number \(B_n\) of successful “coins” is a random variable. It’s distribution is \[\Pr(B_n)=\Pr(k\text{ successes in }n\text{ trials})=\binom{n}{k}p^k q^{n-k}\] It is easy to see that \[\begin{align} \mathbb E(B_n)&=np\\ \mathbb V(B_n)&=npq \end{align}\]

The probability that \(B_n=k\) for each \(k\) can be very small, but for a range is usually bigger

What is the probability that \(B_n\geq a\) for any \(a\)? \[\begin{align}\Pr(B_n\geq a)&=\sum_{k=a}^n\Pr(B_n=k)\\ &=\sum_{k=a}^n\binom{n}{k}p^k q^{n-k}\end{align}\]

What is the probability that \(B_n\) is in the range \([a,b]\)? \[\Pr(a\leq B_n\leq b)=\sum_{k=a}^b\Pr(B_n=k)\] Or, if \(a\) and \(b\) are not integers \[\Pr(a\leq B_n\leq b)=\sum_{a\leq k\leq b}\Pr(B_n=k)\]

We want to see what happens when \(n\) is big. Let \[S_n = \frac{B_n-np}{\sqrt{npq}}\] It is easy to see that \[\begin{align} \mathbb E(S_n)&=0\\ \mathbb V(S_n)&=1 \end{align}\] for all values of \(n\)

To evaluate \(\Pr(a\leq S_n\leq b)\) we can do \[x_k=\frac{k-np}{\sqrt{npq}}\] so \[\Pr(a\leq S_n\leq b)=\sum_{a\leq x_k\leq b}\binom{n}{k}p^k q^{n-k}\] where \(k=np+x_k\sqrt{npq}\) and \((n-k)=nq-x_k\sqrt{npq}\)

Remember that \[\binom{n}{k}=\frac{n!}{k!(n-k)!}\] When \(n\) is big we can approximate the factorial \[n!=Cn^{n+1/2}e^{-n}\] where \(C\) is a constant that we will find later

This is called *Stirling’s approximation* and we will explain it later

Now the combinatorial can be written as \[\begin{align} \frac{n!}{k!(n-k)!}&=\frac{Cn^{n+1/2}e^{-n}}{C^2k^{k+1/2}e^{-k} (n-k)^{n-k+1/2}e^{-(n-k)}}\\ &=\frac{n^{n+1/2}}{C k^{k+1/2} (n-k)^{n-k+1/2}}\\ &=\frac{1}{C}\left(\frac{n}{k(n-k)}\right)^{1/2}\left(\frac{n}{k}\right)^k\left(\frac{n}{n-k}\right)^{n-k} \end{align}\]

\[\binom{n}{k}p^k q^{n-k} =\frac{1}{C}\left(\frac{n}{k(n-k)}\right)^{1/2}\left(\frac{np}{k}\right)^k\left(\frac{nq}{n-k}\right)^{n-k}\] Now \[\frac{n}{k(n-k)}= \frac{n}{(np+x_k\sqrt{npq})(nq-x_k\sqrt{npq})}\approx \frac{1}{npq}\] therefore \[\binom{n}{k}p^k q^{n-k}\approx \frac{1}{C\sqrt{npq}}\left(\frac{np}{k}\right)^k\left(\frac{nq}{n-k}\right)^{n-k}\]

we have \[\ln\left(\left(\frac{np}{k}\right)^k\right) =-k\ln\left(\frac{k}{np}\right)\\ =-(np+x_k\sqrt{npq})\ln\left(\frac{np+x_k\sqrt{npq}}{np}\right)\\ =-(np+x_k\sqrt{npq})\ln\left(1+x_k\sqrt{\frac{q}{np}}\right)\] with the same procedure we get \[\ln\left(\left(\frac{nq}{n-k}\right)^{n-k}\right) =-(nq-x_k\sqrt{npq})\ln\left(1-x_k\sqrt{\frac{p}{nq}}\right)\]

Using now the approximation \(\ln(1+x)\approx x-x^2/2\) we can write \[\ln\left(\left(\frac{np}{k}\right)^k\right) \approx-(np+x_k\sqrt{npq})\left(x_k\sqrt{\frac{q}{np}}-x_k^2\frac{q}{2np}\right)\]

\[\ln\left(\left(\frac{nq}{n-k}\right)^{n-k}\right) \approx-(nq-x_k\sqrt{npq})\left(-x_k\sqrt{\frac{p}{nq}}-x_k^2{\frac{p}{2nq}}\right)\]

\[\ln\left(\left(\frac{np}{k}\right)^k\left(\frac{nq}{n-k}\right)^{n-k}\right) \approx-(x_k\sqrt{npq}-\frac{x_k^2q}{2}+x_k^2q)+(x_k\sqrt{npq}+\frac{x_k^2p}{2}-x_k^2 p)\\ =-(p+q)\frac{x_k^2}{2}= -\frac{x_k^2}{2}\]

\[\begin{align} \ln\left(\left(\frac{np}{k}\right)^k\left(\frac{nq}{n-k}\right)^{n-k}\right) &\approx-\frac{x_k^2}{2}\\ \left(\frac{np}{k}\right)^k\left(\frac{nq}{n-k}\right)^{n-k} &\approx\exp({-x_k^2/2}) \end{align}\] and therefore \[\binom{n}{k}p^k q^{n-k}\approx\frac{1}{C\sqrt{npq}}\exp({-x_k^2/2})\]

Recalling that \[\Pr(a\leq S_n\leq b)=\sum_{a\leq x_k\leq b}\binom{n}{k}p^k q^{n-k}\] then \[\Pr(a\leq S_n\leq b)\approx\sum_{a\leq x_k\leq b} \frac{1}{C\sqrt{npq}}\exp({-x_k^2/2})\]

If we call \(h=1/\sqrt{npq}\) then \(h\to 0\) when \(n\to\infty\) and the sum becomes \[\Pr(a\leq S_n\leq b)=\int_{a}^{b}\frac{1}{C}e^{-x^2/2}\]

To finish the formula we have to find \(C\). Using the first rule of probabilities we have \[\Pr(-\infty\leq S_n\leq \infty)=\int_{-\infty}^{\infty}\frac{1}{C}e^{-x^2/2}=1\] therefore \[C=\int_{-\infty}^{\infty}e^{-x^2/2}=\sqrt{2\pi}\]

The random variable \(S_n\) was chosen to have \(\mathbb E S_n=0\) and \(\mathbb V S_n=1\)

We have shown that, when \(n\) is big, \(S_n\) is a random variable with values in \(\mathbb R\) that follows a *Normal* distribution with mean 0 and variance 1.

In general, if \(X\) is a normal random variable with mean \(\mu\) and variance \(\sigma^2\) then \[\Pr(X\leq b)=\int_{-\infty}^{b}\frac{1}{\sqrt{2\pi\sigma^2}}e^{-(x-\mu)^2/2\sigma}\]

We have shown that the Binomial distribution converges to a Normal distribution when \(n\) grows but the average and variance are fixed.

Since the Binomial distribution is a sum of “coins” \(X_i\), we have shown that if we center and scale the sum of “coins”, all independent, all with the same distribution, then \[\frac{\sum^n X_i-p}{\sqrt{pq/n}}\] will converge to a Normal distribution.

Given