Take a look at Türkiye age distribution

- What is the average age?
- Are men older or younger than women, on average?

In everyday life, if \(𝐱 = \{x_1,…,x_n\}\) we have \[\text{mean}(𝐱)= \frac{1}{n}\sum_i x_i\]

Now, if we count how many of each different value are there \[m_j = \text{number of times that }(x_i=j)\] Then we can write \[\text{mean}(𝐱) =\sum_j j⋅\frac{m_j}{n}\]

In other words, to calculate the average we need to know the proportions

\[\begin{aligned} \text{mean}(𝐱)& = \frac{1}{n}\sum_i x_i\\ & = \frac{1}{n} (\underbrace{1+\cdots+1}_{m_1}+ \underbrace{2+\cdots+2}_{m_2}+\cdots)\\ & = \frac{1}{n} (1⋅m_1 + 2⋅m_2+3⋅m_3+\cdots)\\ & = 1⋅\frac{m_1}{n} + 2⋅\frac{m_2}{n} +\cdots =\sum_j j⋅ \frac{m_j}{n} \end{aligned} \]

Calculate the average age of the population of Türkiye

Calculate the average age of men and women in Türkiye

Compare them

Let’s say that \(Ω=\{a_1, a_2, …,a_n\}\)

The probability distribution is a function \[p: Ω → [0,1]\]

\[p(a_i) = ℙ(X=a_i)= ℙ(\text{outcome is exactly }a_i)\]

There are \(m_c\) cards of color \(c\in\){“red”,“green”,“blue”, “yellow”}

There are \(n=\sum_c m_c\) cards in total

If we do not have any reason to expect any order of cards, then each individual card has the same probability \(1/n\)

The probability of “first color \(c\)” is \[ℙ(\text{color is }c)=\frac{m_c}{n}\]

The most important applications of probabilities are when the outcomes are numbers

More in general, we care about numbers that depend on the experiment outcome

- dice: ⚀ \(↦ 1\), ⚁ \(↦ 2\), …, ⚅ \(↦ 6\)
- coins: “Heads” \(↦ 1\), “Tails” \(↦ 0\)
- temperature
- number of cells
- anything we measure

If the outcomes are numbers, we can use them in formulas

For example, if “Heads \(↦1\) and Tails \(↦0\)”, then we can ask

“What is the sum when we throw \(N\) coins?”

Or if ⚀ \(↦ 1\), ⚁ \(↦ 2\), …, ⚅ \(↦ 6\) we can ask

“What is the average sum of two dice?”

The case where *outcomes* are numbers is so important that it
has a special name

We call them *random variables*

We represent them with capital letters, like \(X\)

Then we can ask: “\(X>1\)” or “\(X=2\)” or “\(X=x\)”

In this last example \(x\) is a fixed number, and \(X\) is random

For any random variable \(X\) we
define the **expected value** (also called **mean
value**) of \(X\) as its average
over the population \[𝔼X=\sum_{y∈Ω} y\,
ℙ(X=y)\] Notice that \(X\) is a
random variable but \(𝔼X\) is not.

Sometimes, for a given random variable, we write \(\mu=𝔼X\)

If \(f:ℝ\to ℝ\) is a
*function*, like for example \[f(x) =
x^2\qquad\text{or}\qquad f(x)=\sqrt{x}\] then we can get the
expected value of \(f(X)\) \[𝔼\,f(X)=\sum_{y∈Ω} f(y)\, ℙ(X=y)\]

If \(X\) and \(Y\) are random variables, and \(\alpha\) is any number, then

\[𝔼(X + Y)=𝔼X + 𝔼Y\] \[𝔼(α X)=α\, 𝔼X\]

So, if \(α\) and \(β\) are fixed numbers, then

\[𝔼(α X +\beta Y)=α\, 𝔼X +β\, 𝔼Y\]

**Exercise:** prove it yourself

The **variance of the population** is defined with the
same idea as the sample variance \[𝕍
X=𝔼(X-𝔼X)^2\] Notice that the variance has *squared
units*

In most cases it is more comfortable to work with the
**standard deviation of the population** \[\sigma=\sqrt{𝕍X}\]

In that case the **population variance** can be written
as \(\sigma^2\)

We can rewrite the **variance of the population** as:
\[𝕍X=𝔼(X-𝔼X)^2=𝔼(X^2)-(𝔼X)^2\] because
\[𝔼(X-𝔼X)^2=𝔼(X^2-2X 𝔼X+(𝔼X)^2)\\=𝔼(X^2)-2𝔼(X
𝔼X)+𝔼(𝔼X)^2\] but \(𝔼X\) is not
random, so \(𝔼(X 𝔼X)=(𝔼X)^2\) and \(𝔼(𝔼X)^2=(𝔼X)^2\)

if \(X\) and \(Y\) are two **independent**
random variables, and \(\alpha\) is a
real number, then

- \(𝕍(X + Y)=𝕍 X + 𝕍 Y\)
- \(𝕍(α X)=α^2 𝕍 X\)

To prove the first equation we use that \(𝔼(XY)=𝔼X\,𝔼Y,\) which is true when \(X\) is independent of \(Y\)

Let’s assume that we have a small sample \(𝐗=(X_1,…,X_n)\)

All \(X_i\) are random variables
taken from the same population.

We take their **sample mean**: \[\text{mean}(𝐗)=\bar{𝐗}=\frac{1}{n}\sum_i
X_i\] Since the sample is random, \(\bar{𝐗}\) is also **a random
variable**

What is the expected value of \(\bar{𝐗}\)?

By definition of *mean*, we have \[𝔼\,\text{mean}(𝐗) =
𝔼(\bar{𝐗})=𝔼\left(\frac{1}{n}\sum_i X_i\right)\] and since \(𝔼(α X+βY)=α𝔼(X)+β𝔼(Y),\) we have \[𝔼(\bar{𝐗})=𝔼\left(\frac{1}{n}\sum_i
X_i\right)=\frac{1}{n}𝔼\sum_i X_i=\frac{1}{n}\sum_i𝔼 X_i\]

All outcomes in the sample are *identically distributed*
because they come from the sample population (which does not change with
the sample)

Let’s assume that each outcomes is *independent*

In that case we will have and i.i.d. sample, and

- All \(𝔼 X_i\) are equal. We call them \(𝔼 X\)
- All \(𝕍 X_i\) are equal. We call them \(𝕍 X\)
- \(X_i\) is independent of \(X_j\) when \(i≠j\)

Since all \(X_i\) come from the same population, \(𝔼 X_i=𝔼 X\) and \[𝔼(\bar{𝐗})=\frac{1}{n}\sum_i𝔼 X=\frac{n}{n}𝔼 X=𝔼 X\]

Good!

The expected value of the sample average is the expected value of the complete population

The variance of a set of numbers is easy to calculate

\[\begin{aligned}\text{var}(𝐗)& =\frac{1}{n}\sum_i (X_i-\bar{𝐗})^2\\ &=\frac{1}{n}\sum_i X_i^2-\bar{𝐗}^2\end{aligned}\]

(Remember: the average of squares minus the square of averages)

Since the sample is random, this is also a random variable.

What is its expected value?

Since \(𝔼(α X+βY)=α𝔼(X)+β𝔼(Y),\) we have

\[\begin{aligned} 𝔼\text{var}(𝐗)&=𝔼\left(\frac{1}{n}\sum_i X_i^2-\left(\frac{1}{n}\sum_i X_i\right)^2\right)\\ &=\frac{1}{n}\sum_i 𝔼\left(X_i^2\right)-\frac{1}{n^2}𝔼 \left(\left(\sum_i X_i\right)^2\right)\end{aligned}\]

Now, since the sample is i.i.d. we have \(𝔼 \left(X_i^2\right)=𝔼 \left(X^2\right)\) and \[\sum_i𝔼 \left(X_i^2\right)=n𝔼 \left(X^2\right)\] therefore \[𝔼\text{var}(𝐗)=\frac{1}{n}n 𝔼\left(X^2\right)-\frac{1}{n^2}𝔼 \left(\sum_i X_i\right)^2\]

We can simplify the second part as \[\left(\sum_i X_i\right)^2=\left(\sum_i X_i\right)\left(\sum_j X_j\right)=\sum_i \sum_j X_i X_j\] therefore \[𝔼 \left(\sum_i X_i\right)^2=\sum_i \sum_j 𝔼 X_i X_j\]

If \(i=j,\) we have \(𝔼 X_i X_j = 𝔼 (X_i^2)=𝔼 (X^2)\) If \(i≠j,\) and since all outcomes are independent, we have \[𝔼 X_i X_j = 𝔼(X_i)𝔼(X_j)=(𝔼X)^2\] therefore \[𝔼\left(\sum_i X_i\right)^2= n 𝔼 (X^2) + n(n-1)(𝔼 X)^2\]

\[\begin{aligned} 𝔼\text{var}(𝐗) &=\frac{1}{n}n𝔼 X^2-\frac{1}{n^2}(n 𝔼 (X^2) + n(n-1)(𝔼 X)^2)\\ & =\frac{1}{n}\left((n-1)𝔼 X^2-(n-1)(𝔼 X)^2\right)\\ & =\frac{n-1}{n}(𝔼 X^2-(𝔼 X)^2)\\ & =\frac{n-1}{n}𝕍X \end{aligned}\]

we have found that \[𝔼\text{var}(𝐗) =
\frac{n-1}{n}𝕍X\] So the variance of the sample **is
not** the variance of the population

Not even on average

If we want to estimate the mean \(𝔼X\) of a population we can use the sample mean \(\bar{X}\)

But if we want to estimate the variance \(𝕍X\) of a population we cannot use the sample variance \(\text{var}(𝐗)\)

Instead we have to use a different formula \[\hat{𝕍}(𝐗) = \frac{1}{n-1}\sum_i(X_i-\bar{𝐗})^2\]

People uses two formulas, depending on the case 𝐱

If you only care about the sample, its variance is \[\text{var}(𝐗) =\frac{1}{n}\sum_i (X_i-\bar{𝐗})^2=\frac{1}{n}\sum_i X_i^2-\bar{𝐗}^2\]

If you care about the population, but only have a sample \[\hat{𝕍}(𝐗) = \frac{1}{n-1}\sum_i(X_i-\bar{𝐗})^2 = \frac{n}{n-1}\text{var}(𝐗)\]

When experiments produce numbers we can calculate average and variance

The population has a fixed mean and variance, and most times we do not know their values

If we have an i.i.d. sample we can estimate the population mean with the sample mean

If the sample is not i.i.d., its mean may not correspond to the population mean

- The sample mean is probably close to the population mean, independent of the probability distribution

- The sample variance is not a good estimation of the population
variance.
- We use a different formula in that case.