Class 8: Last bits of Linear Models

Systems Biology

Andrés Aravena, PhD

October 26, 2021

Model matrix for many factors

So far the independent variables have been

  • Numeric values (e.g. height)
  • Factors (e.g. sex, age, diet, stress, tissue)
  • Sum of numeric and factor (e.g. height + sex)
  • Often we have an Intercept, but it is optional

They are coded as matrices

The R command model.matrix transforms a formula into a matrix

Internally R uses it to prepare the linear model

The truth is that linear models only work with numbers, but we can represent other things with numbers

Example with Intercept

We will use this example data

     sex height weight  hand
1   Male    179     67 Right
2 Female    168     55 Right
4   Male    170     74  Left
5 Female    162     68  Left

which can be modelled as this

model.matrix( ~ sex + height, data=students)|> as.data.frame()
  (Intercept) sexMale height
1           1       1    179
2           1       0    168
4           1       1    170
5           1       0    162

Interpretation (reminder)

If the model is \[y_i = β_0 + β_1 s_i + e_i\] where \(s_i\) is 1 for Male and 0 for Female, then \[\begin{aligned} β_0 &= \text{mean}(Female)\\ β_1 &= \text{mean}(Male)-\text{mean}(Female) \end{aligned}\]

Interpretation

If the model is \[y_i = β_0 + β_1 s_i + β_2 h_i + e_i\] where \(h_i\) is the weight of person \(i\), then \[\begin{aligned} β_0 &= \text{baseline}(Female)\\ β_1 &= \text{baseline}(Male)-\text{baseline}(Female)\\ β_2 &= \text{slope}(Height) \end{aligned}\]

Example without Intercept

Now we have independent Male and Female

model.matrix(~ sex + height + 0, data=students)|> as.data.frame()
  sexFemale sexMale height
1         0       1    179
2         1       0    168
4         0       1    170
5         1       0    162

Interpretation

Now the model is \[y_i = β_1 f_i + β_2 m_i + β_3 h_i + e_i\] where \(m_i\) is 1 for Male and 0 for Female, \(f_i\) is 1 for Female and 0 for Male, and \(h_i\) is the weight, then \[\begin{aligned} β_1 &= \text{baseline}(Female)\\ β_2 &= \text{baseline}(Male)\\ β_3 &= \text{slope}(Height) \end{aligned}\]

Not all combinations at the same time

Notice that we have either

  • An intercept and an indicator for Male
  • An indicator for Male and another for Female

But we cannot have the three at the same time

In that case the independent variables will be 100% correlated

100% correlation is bad

If the model was \[y_i = β_0 + β_1 f_i + β_2 m_i + e_i\] then, for any values \(0≤λ≤1\) and \(0≤ρ≤1,\) we have \[\begin{aligned} β_0 &= λ \text{baseline}(Female) + ρ \text{baseline}(Male)\\ β_1 &= (1-λ) \text{baseline}(Female)\\ β_2 &= (1-ρ) \text{baseline}(Male) \end{aligned}\] In other words, we cannot interpret the coefficients

The best case is when independent variables are uncorrelated

There are other combinations

Maybe the weight depends also on handedness

model.matrix(~ sex + hand + 0, data=students) |> as.data.frame()
  sexFemale sexMale handRight
1         0       1         1
2         1       0         1
4         0       1         0
5         1       0         0

Here we assume that hand is independent of sex

But what if they interact?

Interactions

Maybe left-handed males are heavier

model.matrix(~ sex:hand + 0, data=students) |> as.data.frame()
  sexFemale:handLeft sexMale:handLeft sexFemale:handRight sexMale:handRight
1                  0                0                   0                 1
2                  0                0                   1                 0
4                  0                1                   0                 0
5                  1                0                   0                 0

: means interaction

As we saw, the expression sex:hand creates four columns

  • sexFemale:handLeft which is 1 when sex is Female and hand is Left
  • sexMale:handLeft with the same idea
  • sexFemale:handRight idem
  • sexMale:handRight idem

Interaction and sum

A common case is

model.matrix(~ sex:hand + sex + hand + 0, data=students) |> as.data.frame()
  sexFemale sexMale handRight sexMale:handRight
1         0       1         1                 1
2         1       0         1                 0
4         0       1         0                 0
5         1       0         0                 0

which can also be written as

model.matrix(~ sex*hand + 0, data=students) |> as.data.frame()
  sexFemale sexMale handRight sexMale:handRight
1         0       1         1                 1
2         1       0         1                 0
4         0       1         0                 0
5         1       0         0                 0

Exercise

In this model, what is the interpretation of

  • sexFemale
  • sexMale
  • handRight
  • sexMale:handRight

Combining factors and numeric

What about the interaction between sex and height

model.matrix(~ sex:height + 0, data=students) |> as.data.frame()
  sexFemale:height sexMale:height
1                0            179
2              168              0
4                0            170
5              162              0

What is the interpretation here?

Interpretation

Now the model is \[y_i = β_1 f_i h_i + β_2 m_i h_i + e_i\] where \(m_i\) is 1 for Male and 0 for Female, \(f_i\) is 1 for Female and 0 for Male, and \(h_i\) is the weight, then \[\begin{aligned} β_1 &= \text{slope}(Height|Female)\\ β_2 &= \text{slope}(Height|Male) \end{aligned}\] It does not have intercept
It does not make too much sense (unless we center the data)

Adding intercept for each sex

model.matrix(~ sex:height + sex + 0, data=students)|> as.data.frame()
  sexFemale sexMale sexFemale:height sexMale:height
1         0       1                0            179
2         1       0              168              0
4         0       1                0            170
5         1       0              162              0

What is the interpretation here?

Summary

  • Interactions create new variables in the linear model
  • Chosen wisely, they will tell you what you want to know
  • You can compare them later, using contrasts