A “coin” (officially: a Bernoulli random variable) is an experiment that has two possible outcomes

- Success, or Heads, with probability 𝑝
- Failure, or Tails, with probability 𝑞

Obviously 𝑝 + 𝑞=1

This can represent the effect of a virus in the population, where 𝑝 and 𝑞 correspond to the proportion of sick and healthy people

We would like to know 𝑝

If we are rich and powerful, we can test all the population

Sometimes this is impossible, since the population may be

“all living organisms”

or at least

“all legumes in the last million years”

We need another strategy

If we represent success with 1 and failure with 0, then each individual is a random variable 𝑋. Then we have \[\begin{aligned} 𝔼X &= \sum_{x∈Ω} x⋅ℙ(X=x) \\ & = 1⋅ℙ(X=1) + 0⋅ℙ(X=0) \\ & = p\end{aligned}\]

That is, the expected value of this “coin” is the probability of success. We want to know \(𝔼X\)

- The probability depends on what we know
- Here we assumed that we know the proportion 𝑝 of successes,
- and that the outcome is random.

- This “expected value” will never be an outcome.
- The experiment will be either success or failure, not 𝑝% success
- you cannot be 90% pregnant

We know that \(p=𝔼X,\) that is, to the mean value of successes in the population

So maybe it is a good idea to get a sample of size 𝑛 and calculate its mean value

But the sample is random, so the sample mean will be a random variable

what is the expected value of the sample mean?

what is its relation with the population mean?

Let’s assume that we have a small sample \((X_1,…,X_n).\)

All \(X_i\) are random variables taken from the same population. We take the *average* or **sample mean**: \[\text{mean}(X_1,…,X_n)=\bar{𝐗}=\frac{1}{n}\sum_i X_i\] Since the sample is random, \(\bar{𝐗}\) is also a random variable

What is the expected value of \(\bar{𝐗}\)?

We often assume that all outcomes in the sample are *independent identically distributed* (i.i.d.)

In that case we will have

- All \(𝔼 X_i\) are equal. We call them \(𝔼 X\)
- All \(𝕍 X_i\) are equal. We call them \(𝕍 X\)
- \(X_i\) is independent of \(X_j\) when \(i≠j\)

Since \(𝔼(α X+βY)=α𝔼(X)+β𝔼(Y),\) we have \[𝔼(\bar{𝐗})=𝔼\left(\frac{1}{n}\sum_i X_i\right)=\frac{1}{n}𝔼\sum_i X_i=\frac{1}{n}\sum_i𝔼 X_i\] and since all \(X_i\) come from the same population \[𝔼(\bar{𝐗})=\frac{1}{n}\sum_i𝔼 X=\frac{n}{n}𝔼 X=𝔼 X\]

Good!

Now we have \(𝕍(α X+βY)=α^2𝕍(X)+β^2𝕍(Y),\) thus \[𝕍(\bar{𝐗})=𝕍\left(\frac{1}{n}\sum_i X_i\right)=\frac{1}{n^2}𝕍\sum_i X_i=\frac{1}{n^2}\sum_i 𝕍 X_i\] and since all \(X_i\) come from the same population \[𝕍(\bar{𝐗})=\frac{1}{n^2}\sum_i 𝕍 X=\frac{n}{n^2}𝕍 X=\frac{1}{n}𝕍 X\] So averages of bigger samples have smaller variance

For any random variable \(X,\) we have \[ℙ(|X-𝔼X| ≤ c\sqrt{𝕍X})≥ 1-1/c^2\] in the case of \(\bar{𝐗}\) we have \[ℙ\left(|\bar{𝐗}-𝔼\bar{𝐗}| ≤ c\sqrt{𝕍\bar{𝐗}}\right)≥ 1-1/c^2\] that is \[ℙ\left(|\bar{𝐗}-𝔼X| ≤ c\sqrt{𝕍(X)/n}\right)≥ 1-1/c^2\]

We have \[ℙ\left(-c\sqrt{𝕍(X)/n}≤ 𝔼X-\bar{𝐗}≤ c\sqrt{𝕍(X)/n}\right)≥ 1-1/c^2\] This can also be written as \[ℙ\left(\bar{𝐗}-c\sqrt{𝕍(X)/n}≤𝔼X ≤ \bar{𝐗}+c\sqrt{𝕍(X)/n}\right)≥ 1-1/c^2\]

Thus, we have an interval that **probably** contains the population mean

We want to know the population mean 𝔼X (i.e. the proportion of sick people)

We take a random sample (we test 𝑛 people)

The population average is in the interval \[\left[\bar{𝐗}-c\sqrt{𝕍(X)/n}, \bar{𝐗}+c\sqrt{𝕍(X)/n}\right]\] with probability at least \(1-1/c^2\)

This is called a *confidence interval*

This is an important result

It says that

- The sample average is close to the population average
- When the sample size is large, the interval is narrower
- The margin of error depends on
- the confidence level we chose
- the standard deviation of the population \(\sqrt{𝕍(X)}\)
- the square root of the sample size \(\sqrt{n}\)

As you know, there are two schools of probabilities

- Bayesian school see probabilities as
*degrees of belief* - Frequentists see probabilities as
*averages of many experiments*

The Law of Large numbers shows that, if samples are large, both points of view give the same result

The margin of error depends on the square root of the sample size \(\sqrt{n}\)

Thus, to get double precision, we need 4 times more data

To get one more decimal place (10 times more precision) we need 100 times more data

How many people you need to interview to estimate the average age of Turkish population with a margin of error of 5 years?

… of 1 year?

… of 1 month?

The margin of error depends on the standard deviation of the population

That is, the square root of the population variance \(𝕍(X)\)

But we do not know the population variance

What can we do?