Class 6.1: Law of Large Numbers

Methodology of Scientific Research

Andrés Aravena, PhD

28 April 2021

A viral “coin”

A “coin” (officially: a Bernoulli random variable) is an experiment that has two possible outcomes

  • Success, or Heads, with probability 𝑝
  • Failure, or Tails, with probability 𝑞

Obviously 𝑝 + 𝑞=1

This can represent the effect of a virus in the population, where 𝑝 and 𝑞 correspond to the proportion of sick and healthy people

We would like to know 𝑝

How can we know 𝑝?

If we are rich and powerful, we can test all the population

Sometimes this is impossible, since the population may be

“all living organisms”

or at least

“all legumes in the last million years”

We need another strategy

Using our tools

If we represent success with 1 and failure with 0, then each individual is a random variable 𝑋. Then we have \[\begin{aligned} 𝔼X &= \sum_{x∈Ω} x⋅ℙ(X=x) \\ & = 1⋅ℙ(X=1) + 0⋅ℙ(X=0) \\ & = p\end{aligned}\]

That is, the expected value of this “coin” is the probability of success. We want to know \(𝔼X\)

Two comments

  • The probability depends on what we know
    • Here we assumed that we know the proportion 𝑝 of successes,
    • and that the outcome is random.
  • This “expected value” will never be an outcome.
    • The experiment will be either success or failure, not 𝑝% success
    • you cannot be 90% pregnant

We need more data

We know that \(p=𝔼X,\) that is, to the mean value of successes in the population

So maybe it is a good idea to get a sample of size 𝑛 and calculate its mean value

But the sample is random, so the sample mean will be a random variable

The sample mean is a random variable

  • what is the expected value of the sample mean?

  • what is its relation with the population mean?

Application: Averages

Let’s assume that we have a small sample \((X_1,…,X_n).\)

All \(X_i\) are random variables taken from the same population. We take the average or sample mean: \[\text{mean}(X_1,…,X_n)=\bar{𝐗}=\frac{1}{n}\sum_i X_i\] Since the sample is random, \(\bar{𝐗}\) is also a random variable

What is the expected value of \(\bar{𝐗}\)?

Independent from the same population

We often assume that all outcomes in the sample are independent identically distributed (i.i.d.)

In that case we will have

  • All \(𝔼 X_i\) are equal. We call them \(𝔼 X\)
  • All \(𝕍 X_i\) are equal. We call them \(𝕍 X\)
  • \(X_i\) is independent of \(X_j\) when \(i≠j\)

Expected value of sample mean

Since \(𝔼(α X+βY)=α𝔼(X)+β𝔼(Y),\) we have \[𝔼(\bar{𝐗})=𝔼\left(\frac{1}{n}\sum_i X_i\right)=\frac{1}{n}𝔼\sum_i X_i=\frac{1}{n}\sum_i𝔼 X_i\] and since all \(X_i\) come from the same population \[𝔼(\bar{𝐗})=\frac{1}{n}\sum_i𝔼 X=\frac{n}{n}𝔼 X=𝔼 X\]

Good!

Variance of the sample mean

Now we have \(𝕍(α X+βY)=α^2𝕍(X)+β^2𝕍(Y),\) thus \[𝕍(\bar{𝐗})=𝕍\left(\frac{1}{n}\sum_i X_i\right)=\frac{1}{n^2}𝕍\sum_i X_i=\frac{1}{n^2}\sum_i 𝕍 X_i\] and since all \(X_i\) come from the same population \[𝕍(\bar{𝐗})=\frac{1}{n^2}\sum_i 𝕍 X=\frac{n}{n^2}𝕍 X=\frac{1}{n}𝕍 X\] So averages of bigger samples have smaller variance

Combining with Chebyshev’s inequality

For any random variable \(X,\) we have \[ℙ(|X-𝔼X| ≤ c\sqrt{𝕍X})≥ 1-1/c^2\] in the case of \(\bar{𝐗}\) we have \[ℙ\left(|\bar{𝐗}-𝔼\bar{𝐗}| ≤ c\sqrt{𝕍\bar{𝐗}}\right)≥ 1-1/c^2\] that is \[ℙ\left(|\bar{𝐗}-𝔼X| ≤ c\sqrt{𝕍(X)/n}\right)≥ 1-1/c^2\]

Written as an interval

We have \[ℙ\left(-c\sqrt{𝕍(X)/n}≤ 𝔼X-\bar{𝐗}≤ c\sqrt{𝕍(X)/n}\right)≥ 1-1/c^2\] This can also be written as \[ℙ\left(\bar{𝐗}-c\sqrt{𝕍(X)/n}≤𝔼X ≤ \bar{𝐗}+c\sqrt{𝕍(X)/n}\right)≥ 1-1/c^2\]

Thus, we have an interval that probably contains the population mean

A confidence interval

We want to know the population mean 𝔼X (i.e. the proportion of sick people)

We take a random sample (we test 𝑛 people)

The population average is in the interval \[\left[\bar{𝐗}-c\sqrt{𝕍(X)/n}, \bar{𝐗}+c\sqrt{𝕍(X)/n}\right]\] with probability at least \(1-1/c^2\)

This is called a confidence interval

Law of Large Numbers

This is an important result

It says that

  • The sample average is close to the population average
  • When the sample size is large, the interval is narrower
  • The margin of error depends on
    • the confidence level we chose
    • the standard deviation of the population \(\sqrt{𝕍(X)}\)
    • the square root of the sample size \(\sqrt{n}\)

Frequentist v/s Bayesian philosophies

As you know, there are two schools of probabilities

  • Bayesian school see probabilities as degrees of belief
  • Frequentists see probabilities as averages of many experiments

The Law of Large numbers shows that, if samples are large, both points of view give the same result

The bad news

The margin of error depends on the square root of the sample size \(\sqrt{n}\)

Thus, to get double precision, we need 4 times more data

To get one more decimal place (10 times more precision) we need 100 times more data

Exercise

How many people you need to interview to estimate the average age of Turkish population with a margin of error of 5 years?

… of 1 year?

… of 1 month?

The other bad news

The margin of error depends on the standard deviation of the population

That is, the square root of the population variance \(𝕍(X)\)

But we do not know the population variance

What can we do?