6 December 2017

## Different questions

If $$X$$ is a random variable that follows some distribution, we can ask several questions

• What is the expected value of $$X$$?
• What is the variance of $$X$$?

These two questions can only be answered if we know the full population

In most cases we do not know the real population, but we can assume we know it

## Exercise 1

Let’s say that we know that the expected value of $$X$$ is $$\mu$$ and the variance of $$X$$ is $$\sigma^2$$: $\mathbb EX=\mu\qquad\mathbb VX=\sigma^2$

Then we can ask: What is the probability that $$X$$ is more than any value $$a$$? $\Pr(X > a \vert\mu, \sigma^2)?$

Exercise: Answer that for Binomial $$(\mu=50,\sigma^2=25)$$ and Normal $$(\mu=0, \sigma=1)$$ distributions, for different values of $$a.$$

## Exercise 2

What is the probability that $$X$$ is inside the interval $$[a_1,a_2]$$? $\Pr(a_1\leq X\leq a_2 \vert\mu, \sigma^2)?$

Exercise: Answer that for Binomial and Normal distributions

Can we replace the Binomial $$(\mu=50,\sigma^2=25)$$ by a Normal $$(\mu=50,\sigma^2=25)$$?

## Inverse question

Since in reality we don’t know $$\mu,$$ we would like to ask about it:

What is the probability that $$\mu$$ is in the range $$[b_1,b_2],$$ given that in our experiment $$X$$ had a value $$a$$?

If $$b_1$$ and $$b_2$$ are fixed, that question in useless. The answer is either 1 or 0, since $$\mu$$ is not random. Instead we want to find two functions $$b_1(X)$$ and $$b_2(X)$$ depending on the experiment result $$X$$ such that

$\Pr(b_1(X)<\mu<b_2(X))=1-\alpha$

where $$\alpha$$ is a small number, typically 0.05 or 0.01

## Confidence interval

Exercise: Find 90% confidence intervals + For a Binomial distribution + For a Normal distribution

## Formula

If $$X$$ follows a Normal$$(\mu,\sigma^2)$$, then the value $Z=\frac{X-\mu}{\sigma}$ is also random and follows a Normal(0,1). In particular the average $$\bar x$$ of a sample (i.i.d.) is Normal$$(\mu,\sigma^2/n),$$ so $Z=\frac{\bar x-\mu}{\sigma/\sqrt{n}}$ is also Normal(0,1)

## Formula

Therefore we can calculate $$\Pr(c_1<Z<c_2)$$ for any $$c_1,c_2.$$

Since the Normal distribution is symmetrical around 0 we can choose $$c_1=-c_2.$$ That will give us the narrowest interval

So, given a confidence level $$1-\alpha,$$ we look for $$c$$ such that $\Pr(-c<Z<c)=1-\alpha$

## Finding $$c$$

Again, since the normal distribution is symmetric, $$c$$ will be such that $\Pr(Z< -c)=\Pr(Z>c)=\alpha/2$ This is the value we have to find in a table, or using R

qnorm(1-alpha/2)
qnorm(alpha/2, lower.tail = F)

## Graphic Once we found $$c$$ we can build our interval. If $$-c<Z<c$$ then $-c<\frac{\bar x-\mu}{\sigma/\sqrt{n}}<c$ so $-c\sigma/\sqrt{n}<\bar x-\mu<c\sigma/\sqrt{n}$ then $\bar x-c\sigma/\sqrt{n}<\mu<\bar x+c\sigma/\sqrt{n}$

## In summary

If the average follows a Normal distribution and we know the population variance, then a confidence interval for $$\mu$$ is \begin{aligned} b_1&=\bar x-c(\alpha)\sigma/\sqrt{n}\\ b_2&=\bar x+c(\alpha)\sigma/\sqrt{n} \end{aligned} with $$c(\alpha)$$ is taken from the Normal(0,1) table

## But we don’t know the population variance

Can we use the sample variance?

• No, because it is biased
• But we can use the unbiased variance estimator $S_n=\frac{1}{n-1}\sum_{i=1}^n(x_i-\bar x)^2$
• and we have to pay a price

## Price of ignorance

Since we do not know $$\sigma^2$$ we have to estimate it using the data.

But we use the same data to calculate $$\bar x$$

Thus, $$\bar x$$ and $$S_n$$ are not independent

Now instead of $Z=\frac{\bar x-\mu}{\sigma/\sqrt{n}}$ we define $T=\frac{\bar x-\mu}{S_n/\sqrt{n}}$ which does not follow a Normal, but a Student’s t-distribution

## Student’s t-distribution

The “frequency distribution of standard deviations of samples drawn from a normal population”

This is a family of distributions, depending on a parameter called “degrees of freedom”

Since the sample has size $$n$$ we initially have $$n$$ degrees of freedom. But the average is fixed, so we lose one degree of freedom

## Student’s t-distribution ## Some typical interval limits

DF 90% 95% 99%
1 6.31 12.71 63.66
2 2.92 4.30 9.92
3 2.35 3.18 5.84
4 2.13 2.78 4.60
5 2.02 2.57 4.03
30 1.70 2.04 2.75
Normal 1.64 1.96 2.58