Experiments are small samples of a large population
There is variability in the population
There is noise in every measurement
We want to understand the population, but we only have a sample
We want to separate signal and noise
First, we will assume that we know the population
We will predict what can happen in any random sample
We will compare the predicted sample with the experimental one
Then we will analyze what does this teach us about the population
We will do an experiment that we call \(X\).
Let’s assume that we know
Then we can calculate
The experiment is to ask the age of a random person
Population is “the age of every people living in Turkey”
\(Ω\) is the natural numbers ≤200
\(ℙ(X=x)\) is the proportion of people with age \(x\)
\(𝔼\,X\) is the average age of people in Turkey
\(𝕍\,X\) is the variance of age of people in Turkey
Let’s use the data from Türkiye age distribution
What is the expected value of age?
What is the variance of age?
What does that mean?
The expected value does not tell us exactly what to expect
But it tells us approximately
Outcomes are probably near the expected value
We have the following result \[ℙ(𝔼X-c\sqrt{𝕍X} ≤ X ≤ 𝔼X+c\sqrt{𝕍X})≥ 1-1/c^2\] That is, outcomes are probably close to the expected value
\(c\) is a constant that tells us how many standard deviations we need to increase the probability of getting an outcome close to the expected value
Proved by Pafnuty Lvovich Chebyshev (Пафну́тий Льво́вич Чебышёв) in 1867
It is always valid, for any probability distribution
Later we will see better rules valid only for specific distributions
Chebyshev inequality can also be written as \[ℙ(|X-𝔼X|≤ c⋅\sqrt{𝕍X})≥ 1-1/c^2\]
The probability that
“an outcome \(X\) is near \(𝔼X\) by less than \(c⋅\sqrt{𝕍X}\)”
is greater than \(1-1/c^2\)
Chebyshev inequality can also be written as \[ℙ(|X-𝔼X| > c⋅\sqrt{𝕍X})≤ 1/c^2\]
The probability that
“the distance between \(𝔼X\) and any outcome \(X\) is more than than \(c⋅\sqrt{𝕍X}\)”
is less than \(1/c^2\)
\[ℙ(𝔼X -c⋅\sqrt{𝕍X}≤ X ≤ 𝔼X +c⋅\sqrt{𝕍X})≥ 1-1/c^2\]
Replacing \(c\) for some specific values, we get
\[\begin{aligned} ℙ(|X-𝔼X| ≤ 1⋅\sqrt{𝕍X})&≥ 1-1/1^2=0\\ ℙ(|X-𝔼X| ≤ 2⋅\sqrt{𝕍X})&≥ 1-1/2^2=0.75\\ ℙ(|X-𝔼X| ≤ 3⋅\sqrt{𝕍X})&≥ 1-1/3^2=0.889 \end{aligned}\]
What are the age intervals that contain
at least 75% of Turkish population
at least 8/9 of Turkish population
at least 99% of Turkish population
(read this if you want to know the truth)
If \(Q\) is a yes-no question, we will use the notation \(〚Q〛\) to represent this:
\[〚Q〛=\begin{cases} 1\quad\text{if }Q\text{ is true}\\ 0\quad\text{if }Q\text{ is false} \end{cases}\]
Instead of cramming symbols over and under ∑ \[\sum_{x=1}^{10} f(x)\] we can write the limits at normal size \[\sum_x f(x) 〚1≤x≤10〛\]
If we want to calculate the probability of the event \(Q\), instead of writing \[ℙ(Q)=\sum_{x\text{ makes }Q\text{ true}}ℙ(X=x)\] we can write \[ℙ(Q)=\sum_{x}ℙ(X=x) 〚Q(x)〛\]
By the definition of variance, we have \[𝕍(X)=𝔼(X-𝔼X)^2=\sum_{x∈Ω} (x-𝔼X)^2ℙ(X=x)\] If we multiply the probability by a number that is sometimes 0 and sometimes 1, the right side has to be smaller \[𝕍(X)≥\sum_{x∈Ω} (x-𝔼X)^2ℙ(X=x)〚(x-𝔼X)^2≥α〛\]
We want to make it even smaller
Since we are only taking the cases where \((X-𝔼X)^2≥α\), replacing \((X-𝔼X)^2\) by \(α\) will make the right side even smaller
\[\begin{aligned} 𝕍(X)& ≥α\sum_{x∈Ω} ℙ[X=x]((x-𝔼X)^2≥α)\\ & =αℙ\left((X-𝔼X)^2≥α\right) \end{aligned}\] Then we can divide by \(α\) and we get Chebyshev’s result \[ℙ\left((X-𝔼X)^2≥α\right)≤𝕍(X)/α\]
Chebyshev’s result is \(ℙ\left((X-𝔼X)^2≥α\right)≤𝕍(X)/α.\)
If we choose \(α=c^2⋅𝕍X\) then we have \[ℙ\left((X-𝔼X)^2 ≥ c^2⋅𝕍X \right)≤ 1/c^2\] If we get rid of the squares, we get \[ℙ(|X-𝔼X| ≥ c\sqrt{𝕍X})≤ 1/c^2\] This is the probability that the outcome is far away from the expected value
Now we can look at the opposite event \[ℙ(|X-𝔼X| ≤ c\sqrt{𝕍X})≥ 1-1/c^2\] The event inside \(ℙ()\) can be rewritten as \[-c\sqrt{𝕍X} ≤ X-𝔼X ≤ c\sqrt{𝕍X}\] which means that the outcome is near the expected value \[𝔼X-c\sqrt{𝕍X} ≤ X ≤ 𝔼X+c\sqrt{𝕍X}\]
The event inside \(ℙ()\) is \(|X-𝔼X| ≤ c\sqrt{𝕍X}\)
As we said, it can be rewritten as \[-c\sqrt{𝕍X} ≤ X-𝔼X ≤ c\sqrt{𝕍X}\] which also means that the expected value is near the outcome \[X-c\sqrt{𝕍X} ≤ 𝔼X ≤ X+c\sqrt{𝕍X}\]
This is a confidence interval
We have a small sample \(𝐗=(X_1,…,X_n)\)
All random variables \(X_i\) are
i.i.d.
The average \(\bar{𝐗}\) is also a
random variable
\[𝔼\,\bar{𝐗}=𝔼\,\text{mean}(𝐗)=𝔼\,X\] \[𝔼\,\text{var}(𝐗) = \frac{n-1}{n}𝕍\,X\] What about \(𝕍\,\bar{𝐗}\)?
We have \(𝕍(α X+βY)=α^2𝕍(X)+β^2𝕍(Y),\) thus \[𝕍(\bar{𝐗})=𝕍\left(\frac{1}{n}\sum_i X_i\right)=\frac{1}{n^2}𝕍\sum_i X_i=\frac{1}{n^2}\sum_i 𝕍 X_i\] and since all \(X_i\) come from the same population \[𝕍(\bar{𝐗})=\frac{1}{n^2}\sum_i 𝕍 X=\frac{n}{n^2}𝕍 X=\frac{1}{n}𝕍 X\]
Averages of bigger samples have smaller variance \[𝕍(\bar{𝐗})=\frac{1}{n}𝕍 X\] Its square root is the standard deviation of the sample average \[\sqrt{𝕍(\bar{𝐗})}=\sqrt{\frac{1}{n}𝕍 X}=\frac{\text{stdev}(X)}{\sqrt{n}}\]
This is important. It has its own name: Standard Error
Standard error is the standard deviation of the sample average
It is calculated as the standard deviation of the population divided by the square root of \(n\)
For any random variable \(X,\) we have \[ℙ(|X-𝔼X| ≤ c\sqrt{𝕍X})≥ 1-1/c^2\] in the case of \(\bar{𝐗}\) we have \[ℙ\left(|\bar{𝐗}-𝔼\bar{𝐗}| ≤ c\sqrt{𝕍\bar{𝐗}}\right)≥ 1-1/c^2\] that is \[ℙ\left(|\bar{𝐗}-𝔼X| ≤ c\sqrt{𝕍(X)/n}\right)≥ 1-1/c^2\]
We have \[ℙ\left(-c\sqrt{𝕍(X)/n}≤ 𝔼X-\bar{𝐗} ≤ c\sqrt{𝕍(X)/n}\right)≥ 1-1/c^2\] This can also be written as \[ℙ\left(\bar{𝐗}-c\sqrt{𝕍(X)/n}≤𝔼X ≤ \bar{𝐗}+c\sqrt{𝕍(X)/n}\right)≥ 1-1/c^2\]
Thus, we have an interval that probably contains the population mean
We want to know the population mean \(𝔼\,X\)
We take a random sample (of 𝑛 people)
The population average is in the interval \[\left[\bar{𝐗}-c\sqrt{𝕍(X)/n}, \bar{𝐗}+c\sqrt{𝕍(X)/n}\right]\] with probability at least \(1-1/c^2\)
This is called a confidence interval
This is an important result. It says that
The sample average is close to the population average
When the sample size is large, the interval is narrower
The margin of error depends on
As you know, there are two schools of probabilities
Bayesian school see probabilities as degrees of belief
Frequentists see probabilities as averages of many experiments
The Law of Large numbers shows that, if samples are large, both points of view give the same result
The margin of error depends on the square root of the sample size \(\sqrt{n}\)
Thus, to get double precision, we need 4 times more data
To get one more decimal place (10 times more precision) we need 100 times more data
The margin of error depends on the standard deviation of the population
That is, the square root of the population variance \(𝕍(X)\)
But we do not know the population variance
What can we do?
How many people you need to interview to estimate the average age of Turkish population with a margin of error
… of 5 years?
… of 1 year?
… of 1 month?