Class 8: Last bits of Linear Models

Systems Biology

Andrés Aravena, PhD

October 26, 2021

Model matrix for many factors

So far the independent variables have been

Numeric values (e.g. height)
Factors (e.g. sex, age, diet, stress, tissue)
Sum of numeric and factor (e.g. height + sex)
Often we have an Intercept, but it is optional

They are coded as matrices

The R command model.matrix transforms a formula into a matrix

Internally R uses it to prepare the linear model

The truth is that linear models only work with numbers, but we can represent other things with numbers

Example with Intercept

We will use this example data

     sex height weight  hand
1   Male    179     67 Right
2 Female    168     55 Right
4   Male    170     74  Left
5 Female    162     68  Left

which can be modelled as this

model.matrix( ~ sex + height, data=students)|> as.data.frame()

  (Intercept) sexMale height
1           1       1    179
2           1       0    168
4           1       1    170
5           1       0    162

Interpretation (reminder)

If the model is \[y_i = β_0 + β_1 s_i + e_i\] where \(s_i\) is 1 for Male and 0 for Female, then \[\begin{aligned} β_0 &= \text{mean}(Female)\\ β_1 &= \text{mean}(Male)-\text{mean}(Female) \end{aligned}\]

Interpretation

If the model is \[y_i = β_0 + β_1 s_i + β_2 h_i + e_i\] where \(h_i\) is the weight of person \(i\), then \[\begin{aligned} β_0 &= \text{baseline}(Female)\\ β_1 &= \text{baseline}(Male)-\text{baseline}(Female)\\ β_2 &= \text{slope}(Height) \end{aligned}\]

Example without Intercept

Now we have independent Male and Female

model.matrix(~ sex + height + 0, data=students)|> as.data.frame()

  sexFemale sexMale height
1         0       1    179
2         1       0    168
4         0       1    170
5         1       0    162

Interpretation

Now the model is \[y_i = β_1 f_i + β_2 m_i + β_3 h_i + e_i\] where \(m_i\) is 1 for Male and 0 for Female, \(f_i\) is 1 for Female and 0 for Male, and \(h_i\) is the weight, then \[\begin{aligned} β_1 &= \text{baseline}(Female)\\ β_2 &= \text{baseline}(Male)\\ β_3 &= \text{slope}(Height) \end{aligned}\]

Not all combinations at the same time

Notice that we have either

An intercept and an indicator for Male
An indicator for Male and another for Female

But we cannot have the three at the same time

In that case the independent variables will be 100% correlated

100% correlation is bad

If the model was \[y_i = β_0 + β_1 f_i + β_2 m_i + e_i\] then, for any values \(0≤λ≤1\) and \(0≤ρ≤1,\) we have \[\begin{aligned} β_0 &= λ \text{baseline}(Female) + ρ \text{baseline}(Male)\\ β_1 &= (1-λ) \text{baseline}(Female)\\ β_2 &= (1-ρ) \text{baseline}(Male) \end{aligned}\] In other words, we cannot interpret the coefficients

The best case is when independent variables are uncorrelated

There are other combinations

Maybe the weight depends also on handedness

model.matrix(~ sex + hand + 0, data=students) |> as.data.frame()

  sexFemale sexMale handRight
1         0       1         1
2         1       0         1
4         0       1         0
5         1       0         0

Here we assume that hand is independent of sex

But what if they interact?

Interactions

Maybe left-handed males are heavier

model.matrix(~ sex:hand + 0, data=students) |> as.data.frame()

  sexFemale:handLeft sexMale:handLeft sexFemale:handRight sexMale:handRight
1                  0                0                   0                 1
2                  0                0                   1                 0
4                  0                1                   0                 0
5                  1                0                   0                 0

`:` means interaction

As we saw, the expression sex:hand creates four columns

sexFemale:handLeft which is 1 when sex is Female and hand is Left
sexMale:handLeft with the same idea
sexFemale:handRight idem
sexMale:handRight idem

Interaction and sum

A common case is

model.matrix(~ sex:hand + sex + hand + 0, data=students) |> as.data.frame()

  sexFemale sexMale handRight sexMale:handRight
1         0       1         1                 1
2         1       0         1                 0
4         0       1         0                 0
5         1       0         0                 0

which can also be written as

model.matrix(~ sex*hand + 0, data=students) |> as.data.frame()

  sexFemale sexMale handRight sexMale:handRight
1         0       1         1                 1
2         1       0         1                 0
4         0       1         0                 0
5         1       0         0                 0

Exercise

In this model, what is the interpretation of

sexFemale
sexMale
handRight
sexMale:handRight

Combining factors and numeric

What about the interaction between sex and height

model.matrix(~ sex:height + 0, data=students) |> as.data.frame()

  sexFemale:height sexMale:height
1                0            179
2              168              0
4                0            170
5              162              0

What is the interpretation here?

Interpretation

Now the model is \[y_i = β_1 f_i h_i + β_2 m_i h_i + e_i\] where \(m_i\) is 1 for Male and 0 for Female, \(f_i\) is 1 for Female and 0 for Male, and \(h_i\) is the weight, then \[\begin{aligned} β_1 &= \text{slope}(Height|Female)\\ β_2 &= \text{slope}(Height|Male) \end{aligned}\] It does not have intercept
It does not make too much sense (unless we center the data)

Adding intercept for each sex

model.matrix(~ sex:height + sex + 0, data=students)|> as.data.frame()

  sexFemale sexMale sexFemale:height sexMale:height
1         0       1                0            179
2         1       0              168              0
4         0       1                0            170
5         1       0              162              0

What is the interpretation here?

Summary

Interactions create new variables in the linear model
Chosen wisely, they will tell you what you want to know
You can compare them later, using contrasts

Class 8: Last bits of Linear Models

Systems Biology

Andrés Aravena, PhD

October 26, 2021

Model matrix for many factors

So far the independent variables have been

They are coded as matrices

Example with Intercept

Interpretation (reminder)

Interpretation

Example without Intercept

Interpretation

Not all combinations at the same time

100% correlation is bad

The best case is when independent variables are uncorrelated

There are other combinations

Interactions

: means interaction

Interaction and sum

Exercise

Combining factors and numeric

Interpretation

Adding intercept for each sex

Summary

`:` means interaction