Take a look at Türkiye age distribution
In everyday life, if \(𝐱 = \{x_1,…,x_n\}\) we have \[\text{mean}(𝐱)= \frac{1}{n}\sum_i x_i\]
Now, if we count how many of each different value are there \[m_j = \text{number of times that }(x_i=j)\] Then we can write \[\text{mean}(𝐱) =\sum_j j⋅\frac{m_j}{n}\]
In other words, to calculate the average we need to know the proportions
\[\begin{aligned} \text{mean}(𝐱)& = \frac{1}{n}\sum_i x_i\\ & = \frac{1}{n} (\underbrace{1+\cdots+1}_{m_1}+ \underbrace{2+\cdots+2}_{m_2}+\cdots)\\ & = \frac{1}{n} (1⋅m_1 + 2⋅m_2+3⋅m_3+\cdots)\\ & = 1⋅\frac{m_1}{n} + 2⋅\frac{m_2}{n} +\cdots =\sum_j j⋅ \frac{m_j}{n} \end{aligned} \]
Calculate the average age of the population of Türkiye
Calculate the average age of men and women in Türkiye
Compare them
Let’s say that \(Ω=\{a_1, a_2, …,a_n\}\)
The probability distribution is a function \[p: Ω → [0,1]\]
\[p(a_i) = ℙ(X=a_i)= ℙ(\text{outcome is exactly }a_i)\]
There are \(m_c\) cards of color \(c\in\){“red”,“green”,“blue”, “yellow”}
There are \(n=\sum_c m_c\) cards in total
If we do not have any reason to expect any order of cards, then each individual card has the same probability \(1/n\)
The probability of “first color \(c\)” is \[ℙ(\text{color is }c)=\frac{m_c}{n}\]
The most important applications of probabilities are when the outcomes are numbers
More in general, we care about numbers that depend on the experiment outcome
If the outcomes are numbers, we can use them in formulas
For example, if “Heads \(↦1\) and Tails \(↦0\)”, then we can ask
“What is the sum when we throw \(N\) coins?”
Or if ⚀ \(↦ 1\), ⚁ \(↦ 2\), …, ⚅ \(↦ 6\) we can ask
“What is the average sum of two dice?”
The case where outcomes are numbers is so important that it has a special name
We call them random variables
We represent them with capital letters, like \(X\)
Then we can ask: “\(X>1\)” or “\(X=2\)” or “\(X=x\)”
In this last example \(x\) is a fixed number, and \(X\) is random
For any random variable \(X\) we define the expected value (also called mean value) of \(X\) as its average over the population \[𝔼X=\sum_{y∈Ω} y\, ℙ(X=y)\] Notice that \(X\) is a random variable but \(𝔼X\) is not.
Sometimes, for a given random variable, we write \(\mu=𝔼X\)
If \(f:ℝ\to ℝ\) is a function, like for example \[f(x) = x^2\qquad\text{or}\qquad f(x)=\sqrt{x}\] then we can get the expected value of \(f(X)\) \[𝔼\,f(X)=\sum_{y∈Ω} f(y)\, ℙ(X=y)\]
If \(X\) and \(Y\) are random variables, and \(\alpha\) is any number, then
\[𝔼(X + Y)=𝔼X + 𝔼Y\] \[𝔼(α X)=α\, 𝔼X\]
So, if \(α\) and \(β\) are fixed numbers, then
\[𝔼(α X +\beta Y)=α\, 𝔼X +β\, 𝔼Y\]
Exercise: prove it yourself
The variance of the population is defined with the same idea as the sample variance \[𝕍 X=𝔼(X-𝔼X)^2\] Notice that the variance has squared units
In most cases it is more comfortable to work with the standard deviation of the population \[\sigma=\sqrt{𝕍X}\]
In that case the population variance can be written as \(\sigma^2\)
We can rewrite the variance of the population as: \[𝕍X=𝔼(X-𝔼X)^2=𝔼(X^2)-(𝔼X)^2\] because \[𝔼(X-𝔼X)^2=𝔼(X^2-2X 𝔼X+(𝔼X)^2)\\=𝔼(X^2)-2𝔼(X 𝔼X)+𝔼(𝔼X)^2\] but \(𝔼X\) is not random, so \(𝔼(X 𝔼X)=(𝔼X)^2\) and \(𝔼(𝔼X)^2=(𝔼X)^2\)
if \(X\) and \(Y\) are two independent random variables, and \(\alpha\) is a real number, then
To prove the first equation we use that \(𝔼(XY)=𝔼X\,𝔼Y,\) which is true when \(X\) is independent of \(Y\)
Let’s assume that we have a small sample \(𝐗=(X_1,…,X_n)\)
All \(X_i\) are random variables
taken from the same population.
We take their sample mean: \[\text{mean}(𝐗)=\bar{𝐗}=\frac{1}{n}\sum_i
X_i\] Since the sample is random, \(\bar{𝐗}\) is also a random
variable
What is the expected value of \(\bar{𝐗}\)?
By definition of mean, we have \[𝔼\,\text{mean}(𝐗) = 𝔼(\bar{𝐗})=𝔼\left(\frac{1}{n}\sum_i X_i\right)\] and since \(𝔼(α X+βY)=α𝔼(X)+β𝔼(Y),\) we have \[𝔼(\bar{𝐗})=𝔼\left(\frac{1}{n}\sum_i X_i\right)=\frac{1}{n}𝔼\sum_i X_i=\frac{1}{n}\sum_i𝔼 X_i\]
All outcomes in the sample are identically distributed because they come from the sample population (which does not change with the sample)
Let’s assume that each outcomes is independent
In that case we will have and i.i.d. sample, and
Since all \(X_i\) come from the same population, \(𝔼 X_i=𝔼 X\) and \[𝔼(\bar{𝐗})=\frac{1}{n}\sum_i𝔼 X=\frac{n}{n}𝔼 X=𝔼 X\]
Good!
The expected value of the sample average is the expected value of the complete population
The variance of a set of numbers is easy to calculate
\[\begin{aligned}\text{var}(𝐗)& =\frac{1}{n}\sum_i (X_i-\bar{𝐗})^2\\ &=\frac{1}{n}\sum_i X_i^2-\bar{𝐗}^2\end{aligned}\]
(Remember: the average of squares minus the square of averages)
Since the sample is random, this is also a random variable.
What is its expected value?
Since \(𝔼(α X+βY)=α𝔼(X)+β𝔼(Y),\) we have
\[\begin{aligned} 𝔼\text{var}(𝐗)&=𝔼\left(\frac{1}{n}\sum_i X_i^2-\left(\frac{1}{n}\sum_i X_i\right)^2\right)\\ &=\frac{1}{n}\sum_i 𝔼\left(X_i^2\right)-\frac{1}{n^2}𝔼 \left(\left(\sum_i X_i\right)^2\right)\end{aligned}\]
Now, since the sample is i.i.d. we have \(𝔼 \left(X_i^2\right)=𝔼 \left(X^2\right)\) and \[\sum_i𝔼 \left(X_i^2\right)=n𝔼 \left(X^2\right)\] therefore \[𝔼\text{var}(𝐗)=\frac{1}{n}n 𝔼\left(X^2\right)-\frac{1}{n^2}𝔼 \left(\sum_i X_i\right)^2\]
We can simplify the second part as \[\left(\sum_i X_i\right)^2=\left(\sum_i X_i\right)\left(\sum_j X_j\right)=\sum_i \sum_j X_i X_j\] therefore \[𝔼 \left(\sum_i X_i\right)^2=\sum_i \sum_j 𝔼 X_i X_j\]
If \(i=j,\) we have \(𝔼 X_i X_j = 𝔼 (X_i^2)=𝔼 (X^2)\) If \(i≠j,\) and since all outcomes are independent, we have \[𝔼 X_i X_j = 𝔼(X_i)𝔼(X_j)=(𝔼X)^2\] therefore \[𝔼\left(\sum_i X_i\right)^2= n 𝔼 (X^2) + n(n-1)(𝔼 X)^2\]
\[\begin{aligned} 𝔼\text{var}(𝐗) &=\frac{1}{n}n𝔼 X^2-\frac{1}{n^2}(n 𝔼 (X^2) + n(n-1)(𝔼 X)^2)\\ & =\frac{1}{n}\left((n-1)𝔼 X^2-(n-1)(𝔼 X)^2\right)\\ & =\frac{n-1}{n}(𝔼 X^2-(𝔼 X)^2)\\ & =\frac{n-1}{n}𝕍X \end{aligned}\]
we have found that \[𝔼\text{var}(𝐗) = \frac{n-1}{n}𝕍X\] So the variance of the sample is not the variance of the population
Not even on average
If we want to estimate the mean \(𝔼X\) of a population we can use the sample mean \(\bar{X}\)
But if we want to estimate the variance \(𝕍X\) of a population we cannot use the sample variance \(\text{var}(𝐗)\)
Instead we have to use a different formula \[\hat{𝕍}(𝐗) = \frac{1}{n-1}\sum_i(X_i-\bar{𝐗})^2\]
People uses two formulas, depending on the case 𝐱
If you only care about the sample, its variance is \[\text{var}(𝐗) =\frac{1}{n}\sum_i (X_i-\bar{𝐗})^2=\frac{1}{n}\sum_i X_i^2-\bar{𝐗}^2\]
If you care about the population, but only have a sample \[\hat{𝕍}(𝐗) = \frac{1}{n-1}\sum_i(X_i-\bar{𝐗})^2 = \frac{n}{n-1}\text{var}(𝐗)\]
When experiments produce numbers we can calculate average and variance
The population has a fixed mean and variance, and most times we do not know their values
If we have an i.i.d. sample we can estimate the population mean with the sample mean
If the sample is not i.i.d., its mean may not correspond to the population mean