Class 34: Hypothesis test

Methodology of Scientific Research

Andrés Aravena, PhD

25 May 2023

Are men same age as women, on average?

If we know the population

In the population we can see that the average age for women is \[μ_f = 34.763\text{ years}\]

And for men it is \[μ_m = 33.560\text{ years}\]

so women are older than men, on average

But we do not know the population

This time we have a sample of men and a sample of women

We calculate \(\bar{X}_f\) and \(\bar{X}_m\)

Each one can be modelled by a Normal distribution (why?)

Then \((\bar{X}_f - \bar{X}_m)\) follows a normal

What are the parameters of this distribution?

Parameters of a Normal distribution

A normal distribution is defined by two parameters

  • The mean \(μ\)
  • The variance \(σ^2\)

Since we deal with averages, we have \[\begin{aligned}\bar{X}_f &∼ N(μ_f, σ^2_f/n_f) \\ \bar{X}_m &∼ N(μ_m, σ^2_m/n_m) \end{aligned}\]

What are the parameters for \((\bar{X}_f - \bar{X}_m)\)?

Parameters of \((\bar{X}_f - \bar{X}_m)\)

“Expected value of sum is sum of expected values” \[μ=μ_f - μ_m\]

“Variance of sum is sum of variances” \[σ^2=\frac{σ^2_f}{n_f} + \frac{σ^2_m}{n_m}\]

(how do we handle the signs?)

Confidence interval for \((\bar{X}_f - \bar{X}_m)\)

First, define the confidence level. Call it \((1-α)\)

Then, each tail must include \(α/2\) of the cases

We look in the inverse Normal function to find \(k_l\) and \(k_u\)

The interval for \((\bar{X}_f - \bar{X}_m)\) is \[[μ + k_l⋅ σ, μ + k_u⋅σ ]\]

We do not know \(μ.\) Finding it

Now we know \(y=(\bar{X}_f - \bar{X}_m)\) and want to find \(μ\)

We build a similar interval \[I=[y + k_l⋅ σ, y + k_u⋅σ ]\] then \[ℙ(μ∈I) = 1-α\]

Are men the same age of women?

If the interval \(I\) does not contain 0, we can be confident that \(μ_f ≠ μ_m\)

How confident? Well, \((1-α)\)

If \(0∈I\) then it is possible that \(μ_f = μ_m\)

We do not have enough evidence to decide

What is the smallest \(α\) that works?

The smallest \(α\) is the one that makes one of the interval limits equal to 0

If \(μ_f = μ_m\) (that is, \(μ=0\)), what is the probability that we observe \((\bar{X}_f - \bar{X}_m)= y\)?

Hypothesis declaration

There is a standard framework for these questions

We start by defining what do we want to test, and what is the alternative \[\begin{aligned} H_0:&μ_f = μ_m\\ H_a:&μ_f ≠ μ_m\\ \end{aligned}\]

Here \(H_0\) is called null hypothesis and \(H_a\) is the alternative hypothesis

Hypothesis test

Basically we want to know the probability of observing \(\bar{X}_f ≠ \bar{X}_m\) equal to \(y\) or more, assuming \(H_0\)

In this case we want to calculate \[ℙ(|\bar{X}_f- \bar{X}_m| ≥ y | μ = 0, σ )?\]

This is called a two-sided test

One-sided test

If we declare our test as \[\begin{aligned} H_0:&μ_f = μ_m\\ H_a:&μ_f > μ_m\\ \end{aligned}\]

Then the question is

\[ℙ(\bar{X}_f - \bar{X}_m ≥ y| μ = 0, σ )?\]