22 November 2017

## Summary

A probability space is defined by two things: $$\Omega$$ and $$p.$$

• $$\Omega$$ is the set of all possible outcomes. Can be finite or infinite
• Each element $$\omega\in\Omega$$ is a single outcome
• Each experiment produces a unique outcome
• The distribution $$p:\Omega\to\mathbb R$$ is a function such that $0\leq p(\omega)\leq 1\quad \forall \omega\in\Omega\\ \sum_{\omega\in\Omega}p(\omega)=1$

## Questions

We can ask any question that can be answered yes or no, about a future outcome $$\omega$$

Any question is valid, as long as we can answer yes or no when we know $$\omega$$

We can represent each question $$Q(\omega)$$ as a subset $$E$$ of $$\Omega$$ $E=\{\omega\in\Omega: Q(\omega)\text{ is true}\}$ The question $$Q(\omega)$$ is true iff $$\omega\in E$$

## Events

An event is a subset $$E\subset \Omega$$. It is also a yes-or-no question about a future outcome

(Since $$\Omega$$ is discrete, all questions are events. That may not be the case if $$\Omega=\mathbb R$$)

The probability of an event $$E$$ is $\Pr(E)=\sum_{\omega\in E}p(\omega)$

## Simple and compound events

All events are subsets of $$\Omega$$

If the event $$E=\{\omega\}$$ contains only one element, we call it a simple event

In that case $\Pr(E)=\Pr(\{\omega\})=p(\omega)$ Abusing the notation, we can write $\Pr(\omega)=\Pr(\{\omega\})=p(\omega)$

## Notation

If $$Q$$ is a yes-no question, we will use the notation $$[Q]$$ to represent this: $[Q]=\begin{cases}1\quad\text{if }Q\text{ is true}\\ 0\quad\text{if }Q\text{ is false}\end{cases}$

Then instead of writing $$\Pr(E)=\sum_{\omega\in E}p(\omega)$$ we can write $\Pr(E)=\sum_{\omega}p(\omega) [\omega\in E]$

## Random variable

If we can calculate a number that depends on the outcome of an experiment, that number is called random variable

Formally, a random variable is any function $$X:\Omega\to \mathbb R$$

For example, if we throw two dice, we can define $$X(\omega)$$ to be the sum of the spots on the outcome $$\omega$$

The probability that the spots total seven is the probability of the event $$X(\omega) = 7$$

To simplify, instead of $$\Pr(X(\omega) = 7)$$ we write $$\Pr(X = 7)$$

## Reminder: sample average and variance

For a vector $$\vec x = (x_1,\cdots,x_n)$$ we have $p(x)=N(x)/n$ $\text{mean}(\vec x)=\bar x = \frac{1}{n}\sum_i x_i=\sum_x x \frac{N(x)}{n}=\sum_x x\, p(x)$ $\text{Var}(\vec x)= \frac{1}{n}\sum_i (x_i-\bar x)^2=\sum_x (x-\bar x)^2\, p(x)$

## Expected value - Mean value

For any random variable $$X$$ we define the expected value (also called mean value) of $$X$$ as the average of $$X(\omega)$$ over the population $$\Omega$$ $\mathbb EX=\sum_\omega X(\omega)\, \Pr(X(\omega))=\sum_\omega X(\omega)\, p(\omega)$ Notice that $$X$$ is a random variable but $$\mathbb EX$$ is not.

Sometimes, for a given population and random variable, we call $$\mu=\mathbb EX$$

## Expected value is linear

If $$X$$ and $$Y$$ are two random variables, and $$\alpha$$ is a real number, then

• $$\mathbb E(X + Y)=\mathbb EX + \mathbb EY$$
• $$\mathbb E(\alpha X)=\alpha \mathbb EX$$

So, if $$\alpha$$ and $$\beta$$ are real numbers, then

• $$\mathbb E(\alpha X +\beta Y)=\alpha \mathbb EX +\beta \mathbb EY$$

## Variance of the population

The variance of the population is defined with the same idea as the sample variance $\mathbb VX=\mathbb E(X-\mathbb EX)^2$ Notice that the variance has squared units

In most cases it is more comfortable to work with the standard deviation $$\sigma=\sqrt{\mathbb VX}.$$

In that case the population variance can be written as $$\sigma^2$$

## Simple formula for population variance

We can rewrite the variance of the population with a simpler formula: $\mathbb VX=\mathbb E(X-\mathbb EX)^2=\mathbb E(X^2)-(\mathbb EX)^2$ because $\mathbb E(X-\mathbb EX)^2=\mathbb E(X^2-2X\mathbb EX+\mathbb (EX)^2)\\=\mathbb E(X^2)-2\mathbb E(X\mathbb EX)+\mathbb E(\mathbb EX)^2$ but $$\mathbb EX$$ is a non-random number, so $$\mathbb E(X\mathbb EX)=(\mathbb EX)^2$$ and $$\mathbb E(\mathbb EX)^2=(\mathbb EX)^2$$

## Variance is almost linear

if $$X$$ and $$Y$$ are two independent random variables, and $$\alpha$$ is a real number, then

• $$\mathbb V(X + Y)=\mathbb V X + \mathbb V Y$$
• $$\mathbb V(\alpha X)=\alpha^2 \mathbb V X$$

To prove the first equation we use that $$\mathbb E(XY)=\mathbb EX\,\mathbb EY,$$ which is true when $$X\perp Y$$

## Application: Averages

Let’s assume that we have a small sample $$\vec x = (x_1,\cdots,x_n).$$

All $$x_i$$ are taken from the same population $$X$$ with mean $$\mathbb EX$$ and variance $$\mathbb VX$$. We take the average or sample mean: $\text{mean}(\vec x)=\bar x = \frac{1}{n}\sum_i x_i$ Since $$\bar x$$ is a function of random variables, $$\bar x$$ is also a random variable

What is the expected value and the variance of $$\bar x$$?

## Independent from the same population

We often assume that all outcomes in the sample are independent identically distributed (i.i.d.)

In that case we will have

• $$\mathbb E x_i = \mathbb E X$$ for all $$i$$
• $$\mathbb V x_i = \mathbb V X$$ for all $$i$$
• $$\mathbb E x_i x_{j} = \begin{cases}\mathbb E (X^2)\quad\text{ if }i=j\\(\mathbb E X)^2\quad\text{ if }i\not=j\end{cases}$$

This las result can be written as $$\mathbb E x_i x_{j} = \mathbb E (X^2)[i=j] + (\mathbb E X)^2 [i\not=j]$$

## Expected value of sample mean

Therefore, since $$\mathbb E$$ is linear, $\mathbb E\bar x=\mathbb E\left(\frac{1}{n}\sum_i x_i\right)=\frac{1}{n}\mathbb E\sum_i x_i=\frac{1}{n}\sum_i\mathbb E x_i$ and since all $$x_i$$ come from the same population $\mathbb E\bar x=\frac{1}{n}\sum_i\mathbb E X=\frac{n}{n}\mathbb E X=\mathbb E X$

## Variance of the sample mean

Now we have $\mathbb V\bar x=\mathbb V\left(\frac{1}{n}\sum_i x_i\right)=\frac{1}{n^2}\mathbb V\sum_i x_i=\frac{1}{n^2}\sum_i\mathbb V x_i$ and since all $$x_i$$ come from the same population $\mathbb V\bar x=\frac{1}{n^2}\sum_i\mathbb V X=\frac{n}{n^2}\mathbb V X=\frac{1}{n}\mathbb V X$ So averages of bigger samples have smaller variance

## Why we care about variance

The main reason is that it tells us how close is most of the population to the mean $\Pr((X-EX)^2\geq\alpha)\leq\mathbb VX/\alpha$ This is Chebyshev’s inequality and it is always valid, for any probability distribution

It may be easier to understand if we call $$\sigma=\sqrt{\mathbb VX}$$ and $$c=\mathbb VX/\alpha,$$ we have $\Pr(\vert X-\mathbb EX\vert\geq c\sigma)\leq 1/c^2$

## Proof of Chebyshev’s inequality

By the definition we have $\mathbb V(X)=\mathbb E(X-\mathbb EX)^2=\sum_\omega (X(\omega)-\mathbb EX)^2\Pr(\omega)$ If we multiply the probability by a number that is sometimes 0 and sometimes 1, the sum has to be smaller $\mathbb V(X)\geq\sum_\omega (X(\omega)-\mathbb EX)^2\Pr(\omega)[(X(\omega)-\mathbb EX)^2\geq\alpha]$ In that case all $$(X-\mathbb EX)^2\geq\alpha$$, therefore $\mathbb V(X)\geq\alpha\sum_\omega \Pr(\omega)[(X(\omega)-\mathbb EX)^2\geq\alpha] =\alpha\Pr\left((X-\mathbb EX)^2\geq\alpha\right)$ Then we can divide by $$\alpha$$ and we get our result

## Sample versus population

Variance $$\mathbb VX$$ and mean $$\mathbb EX$$ of the population are often unknown

Usually we only have a small sample $$\vec x = (x_1,\cdots,x_n)$$

Assuming that all $$x_i$$ are taken from the same population and are mutually independent, what can we say about the sample mean and variance?

## Expected value of sample variance

$\text{Var}(\vec x)=\frac{1}{n}\sum_i (x_i-\bar x)^2=\frac{1}{n}\sum_i x_i^2-\bar x^2$ therefore, since $$\mathbb E$$ is linear, $\mathbb E\text{Var}(\vec x)=\mathbb E\left(\frac{1}{n}\sum_i x_i^2-\frac{1}{n^2}(\sum_i x_i)^2\right)=\frac{1}{n}\mathbb E\sum_i x_i^2-\frac{1}{n^2}\mathbb E (\sum_i x_i)^2$

Now in the first part we have $\mathbb E\sum_i x_i^2 =\sum_i\mathbb E x_i^2=n\mathbb E X^2$

## Second part

We can simplify the second part as $(\sum_i x_i)^2=(\sum_i x_i)(\sum_j x_j)=\sum_i \sum_j x_i x_j$ therefore $\mathbb E (\sum_i x_i)^2=\sum_i \sum_j \mathbb E x_i x_j=\sum_i \sum_j \mathbb E (X^2)[i=j] + (\mathbb E X)^2 [i\not=j]$ $\mathbb E (\sum_i x_i)^2= n \mathbb E (X^2) + n(n-1)(\mathbb E X)^2$

## Putting it all together

$\mathbb E\text{Var}(\vec x)=\frac{1}{n}n\mathbb E X^2-\frac{1}{n^2}(n \mathbb E (X^2) + n(n-1)(\mathbb E X)^2)$ $\mathbb E\text{Var}(\vec x)=\frac{1}{n}\left((n-1)\mathbb E X^2-(n-1)(\mathbb E X)^2\right)$ $\mathbb E\text{Var}(\vec x)=\frac{n-1}{n}(\mathbb E X^2-(\mathbb E X)^2)=\frac{n-1}{n}\mathbb VX$

## Sample variance is biased

If we want to estimate the mean $$\mathbb EX$$ of a population we can use the sample mean $$\bar x$$

But if we want to estimate the variance $$\mathbb VX$$ of a population we cannot use the sample variance $$\text{Var}(x)$$

Instead we have to use an estimator $\hat{\mathbb V}(X) = \frac{1}{n-1}\sum_i(x_i-\bar x)^2$

## In summary

• When experiments produce numbers we can calculate average and variance
• The population has a fixed mean and variance, even if we do not know their values
• If we have an i.i.d sample we can estimate the population mean with the sample mean
• The sample mean is probably close to the population mean, independent of the probability distribution
• If the sample is 4 times bigger, the sample mean is 2 times closer to the population mean
• The sample variance is not a good estimation of the population variance. We use a different formula in that case.