22 November 2017

Summary

A probability space is defined by two things: \(\Omega\) and \(p.\)

  • \(\Omega\) is the set of all possible outcomes. Can be finite or infinite
  • Each element \(\omega\in\Omega\) is a single outcome
  • Each experiment produces a unique outcome
  • The distribution \(p:\Omega\to\mathbb R\) is a function such that \[0\leq p(\omega)\leq 1\quad \forall \omega\in\Omega\\ \sum_{\omega\in\Omega}p(\omega)=1\]

Questions

We can ask any question that can be answered yes or no, about a future outcome \(\omega\)

Any question is valid, as long as we can answer yes or no when we know \(\omega\)

We can represent each question \(Q(\omega)\) as a subset \(E\) of \(\Omega\) \[E=\{\omega\in\Omega: Q(\omega)\text{ is true}\}\] The question \(Q(\omega)\) is true iff \(\omega\in E\)

Events

An event is a subset \(E\subset \Omega\). It is also a yes-or-no question about a future outcome

(Since \(\Omega\) is discrete, all questions are events. That may not be the case if \(\Omega=\mathbb R\))

The probability of an event \(E\) is \[\Pr(E)=\sum_{\omega\in E}p(\omega)\]

Simple and compound events

All events are subsets of \(\Omega\)

If the event \(E=\{\omega\}\) contains only one element, we call it a simple event

In that case \[\Pr(E)=\Pr(\{\omega\})=p(\omega)\] Abusing the notation, we can write \[\Pr(\omega)=\Pr(\{\omega\})=p(\omega)\]

Notation

If \(Q\) is a yes-no question, we will use the notation \([Q]\) to represent this: \[[Q]=\begin{cases}1\quad\text{if }Q\text{ is true}\\ 0\quad\text{if }Q\text{ is false}\end{cases}\]

Then instead of writing \(\Pr(E)=\sum_{\omega\in E}p(\omega)\) we can write \[\Pr(E)=\sum_{\omega}p(\omega) [\omega\in E]\]

Random variable

If we can calculate a number that depends on the outcome of an experiment, that number is called random variable

Formally, a random variable is any function \(X:\Omega\to \mathbb R\)

For example, if we throw two dice, we can define \(X(\omega)\) to be the sum of the spots on the outcome \(\omega\)

The probability that the spots total seven is the probability of the event \(X(\omega) = 7\)

To simplify, instead of \(\Pr(X(\omega) = 7)\) we write \(\Pr(X = 7)\)

Reminder: sample average and variance

For a vector \(\vec x = (x_1,\cdots,x_n)\) we have \[p(x)=N(x)/n\] \[\text{mean}(\vec x)=\bar x = \frac{1}{n}\sum_i x_i=\sum_x x \frac{N(x)}{n}=\sum_x x\, p(x)\] \[\text{Var}(\vec x)= \frac{1}{n}\sum_i (x_i-\bar x)^2=\sum_x (x-\bar x)^2\, p(x)\]

Expected value - Mean value

For any random variable \(X\) we define the expected value (also called mean value) of \(X\) as the average of \(X(\omega)\) over the population \(\Omega\) \[\mathbb EX=\sum_\omega X(\omega)\, \Pr(X(\omega))=\sum_\omega X(\omega)\, p(\omega)\] Notice that \(X\) is a random variable but \(\mathbb EX\) is not.

Sometimes, for a given population and random variable, we call \(\mu=\mathbb EX\)

Expected value is linear

If \(X\) and \(Y\) are two random variables, and \(\alpha\) is a real number, then

  • \(\mathbb E(X + Y)=\mathbb EX + \mathbb EY\)
  • \(\mathbb E(\alpha X)=\alpha \mathbb EX\)

So, if \(\alpha\) and \(\beta\) are real numbers, then

  • \(\mathbb E(\alpha X +\beta Y)=\alpha \mathbb EX +\beta \mathbb EY\)

Variance of the population

The variance of the population is defined with the same idea as the sample variance \[\mathbb VX=\mathbb E(X-\mathbb EX)^2\] Notice that the variance has squared units

In most cases it is more comfortable to work with the standard deviation \(\sigma=\sqrt{\mathbb VX}.\)

In that case the population variance can be written as \(\sigma^2\)

Simple formula for population variance

We can rewrite the variance of the population with a simpler formula: \[\mathbb VX=\mathbb E(X-\mathbb EX)^2=\mathbb E(X^2)-(\mathbb EX)^2\] because \[\mathbb E(X-\mathbb EX)^2=\mathbb E(X^2-2X\mathbb EX+\mathbb (EX)^2)\\=\mathbb E(X^2)-2\mathbb E(X\mathbb EX)+\mathbb E(\mathbb EX)^2\] but \(\mathbb EX\) is a non-random number, so \(\mathbb E(X\mathbb EX)=(\mathbb EX)^2\) and \(\mathbb E(\mathbb EX)^2=(\mathbb EX)^2\)

Variance is almost linear

if \(X\) and \(Y\) are two independent random variables, and \(\alpha\) is a real number, then

  • \(\mathbb V(X + Y)=\mathbb V X + \mathbb V Y\)
  • \(\mathbb V(\alpha X)=\alpha^2 \mathbb V X\)

To prove the first equation we use that \(\mathbb E(XY)=\mathbb EX\,\mathbb EY,\) which is true when \(X\perp Y\)

Application: Averages

Let’s assume that we have a small sample \(\vec x = (x_1,\cdots,x_n).\)

All \(x_i\) are taken from the same population \(X\) with mean \(\mathbb EX\) and variance \(\mathbb VX\). We take the average or sample mean: \[\text{mean}(\vec x)=\bar x = \frac{1}{n}\sum_i x_i\] Since \(\bar x\) is a function of random variables, \(\bar x\) is also a random variable

What is the expected value and the variance of \(\bar x\)?

Independent from the same population

We often assume that all outcomes in the sample are independent identically distributed (i.i.d.)

In that case we will have

  • \(\mathbb E x_i = \mathbb E X\) for all \(i\)
  • \(\mathbb V x_i = \mathbb V X\) for all \(i\)
  • \(\mathbb E x_i x_{j} = \begin{cases}\mathbb E (X^2)\quad\text{ if }i=j\\(\mathbb E X)^2\quad\text{ if }i\not=j\end{cases}\)

This las result can be written as \(\mathbb E x_i x_{j} = \mathbb E (X^2)[i=j] + (\mathbb E X)^2 [i\not=j]\)

Expected value of sample mean

Therefore, since \(\mathbb E\) is linear, \[\mathbb E\bar x=\mathbb E\left(\frac{1}{n}\sum_i x_i\right)=\frac{1}{n}\mathbb E\sum_i x_i=\frac{1}{n}\sum_i\mathbb E x_i\] and since all \(x_i\) come from the same population \[\mathbb E\bar x=\frac{1}{n}\sum_i\mathbb E X=\frac{n}{n}\mathbb E X=\mathbb E X\]

Variance of the sample mean

Now we have \[\mathbb V\bar x=\mathbb V\left(\frac{1}{n}\sum_i x_i\right)=\frac{1}{n^2}\mathbb V\sum_i x_i=\frac{1}{n^2}\sum_i\mathbb V x_i\] and since all \(x_i\) come from the same population \[\mathbb V\bar x=\frac{1}{n^2}\sum_i\mathbb V X=\frac{n}{n^2}\mathbb V X=\frac{1}{n}\mathbb V X\] So averages of bigger samples have smaller variance

Why we care about variance

The main reason is that it tells us how close is most of the population to the mean \[\Pr((X-EX)^2\geq\alpha)\leq\mathbb VX/\alpha\] This is Chebyshev’s inequality and it is always valid, for any probability distribution

It may be easier to understand if we call \(\sigma=\sqrt{\mathbb VX}\) and \(c=\mathbb VX/\alpha,\) we have \[\Pr(\vert X-\mathbb EX\vert\geq c\sigma)\leq 1/c^2\]

Proof of Chebyshev’s inequality

By the definition we have \[\mathbb V(X)=\mathbb E(X-\mathbb EX)^2=\sum_\omega (X(\omega)-\mathbb EX)^2\Pr(\omega)\] If we multiply the probability by a number that is sometimes 0 and sometimes 1, the sum has to be smaller \[\mathbb V(X)\geq\sum_\omega (X(\omega)-\mathbb EX)^2\Pr(\omega)[(X(\omega)-\mathbb EX)^2\geq\alpha]\] In that case all \((X-\mathbb EX)^2\geq\alpha\), therefore \[\mathbb V(X)\geq\alpha\sum_\omega \Pr(\omega)[(X(\omega)-\mathbb EX)^2\geq\alpha] =\alpha\Pr\left((X-\mathbb EX)^2\geq\alpha\right)\] Then we can divide by \(\alpha\) and we get our result

Sample versus population

Variance \(\mathbb VX\) and mean \(\mathbb EX\) of the population are often unknown

Usually we only have a small sample \(\vec x = (x_1,\cdots,x_n)\)

Assuming that all \(x_i\) are taken from the same population and are mutually independent, what can we say about the sample mean and variance?

Expected value of sample variance

\[\text{Var}(\vec x)=\frac{1}{n}\sum_i (x_i-\bar x)^2=\frac{1}{n}\sum_i x_i^2-\bar x^2\] therefore, since \(\mathbb E\) is linear, \[\mathbb E\text{Var}(\vec x)=\mathbb E\left(\frac{1}{n}\sum_i x_i^2-\frac{1}{n^2}(\sum_i x_i)^2\right)=\frac{1}{n}\mathbb E\sum_i x_i^2-\frac{1}{n^2}\mathbb E (\sum_i x_i)^2\]

Now in the first part we have \[\mathbb E\sum_i x_i^2 =\sum_i\mathbb E x_i^2=n\mathbb E X^2\]

Second part

We can simplify the second part as \[(\sum_i x_i)^2=(\sum_i x_i)(\sum_j x_j)=\sum_i \sum_j x_i x_j\] therefore \[\mathbb E (\sum_i x_i)^2=\sum_i \sum_j \mathbb E x_i x_j=\sum_i \sum_j \mathbb E (X^2)[i=j] + (\mathbb E X)^2 [i\not=j]\] \[\mathbb E (\sum_i x_i)^2= n \mathbb E (X^2) + n(n-1)(\mathbb E X)^2\]

Putting it all together

\[\mathbb E\text{Var}(\vec x)=\frac{1}{n}n\mathbb E X^2-\frac{1}{n^2}(n \mathbb E (X^2) + n(n-1)(\mathbb E X)^2)\] \[\mathbb E\text{Var}(\vec x)=\frac{1}{n}\left((n-1)\mathbb E X^2-(n-1)(\mathbb E X)^2\right)\] \[\mathbb E\text{Var}(\vec x)=\frac{n-1}{n}(\mathbb E X^2-(\mathbb E X)^2)=\frac{n-1}{n}\mathbb VX\]

Sample variance is biased

If we want to estimate the mean \(\mathbb EX\) of a population we can use the sample mean \(\bar x\)

But if we want to estimate the variance \(\mathbb VX\) of a population we cannot use the sample variance \(\text{Var}(x)\)

Instead we have to use an estimator \[\hat{\mathbb V}(X) = \frac{1}{n-1}\sum_i(x_i-\bar x)^2\]

In summary

  • When experiments produce numbers we can calculate average and variance
  • The population has a fixed mean and variance, even if we do not know their values
  • If we have an i.i.d sample we can estimate the population mean with the sample mean
  • The sample mean is probably close to the population mean, independent of the probability distribution
  • If the sample is 4 times bigger, the sample mean is 2 times closer to the population mean
  • The sample variance is not a good estimation of the population variance. We use a different formula in that case.