The R command model.matrix
transforms a formula into a matrix
Internally R uses it to prepare the linear model
The truth is that linear models only work with numbers, but we can represent other things with numbers
We will use this example data
sex height weight hand
1 Male 179 67 Right
2 Female 168 55 Right
4 Male 170 74 Left
5 Female 162 68 Left
which can be modelled as this
(Intercept) sexMale height
1 1 1 179
2 1 0 168
4 1 1 170
5 1 0 162
If the model is \[y_i = β_0 + β_1 s_i + e_i\] where \(s_i\) is 1 for Male and 0 for Female, then \[\begin{aligned} β_0 &= \text{mean}(Female)\\ β_1 &= \text{mean}(Male)-\text{mean}(Female) \end{aligned}\]
If the model is \[y_i = β_0 + β_1 s_i + β_2 h_i + e_i\] where \(h_i\) is the weight of person \(i\), then \[\begin{aligned} β_0 &= \text{baseline}(Female)\\ β_1 &= \text{baseline}(Male)-\text{baseline}(Female)\\ β_2 &= \text{slope}(Height) \end{aligned}\]
Now we have independent Male and Female
sexFemale sexMale height
1 0 1 179
2 1 0 168
4 0 1 170
5 1 0 162
Now the model is \[y_i = β_1 f_i + β_2 m_i + β_3 h_i + e_i\] where \(m_i\) is 1 for Male and 0 for Female, \(f_i\) is 1 for Female and 0 for Male, and \(h_i\) is the weight, then \[\begin{aligned} β_1 &= \text{baseline}(Female)\\ β_2 &= \text{baseline}(Male)\\ β_3 &= \text{slope}(Height) \end{aligned}\]
Notice that we have either
But we cannot have the three at the same time
In that case the independent variables will be 100% correlated
If the model was \[y_i = β_0 + β_1 f_i + β_2 m_i + e_i\] then, for any values \(0≤λ≤1\) and \(0≤ρ≤1,\) we have \[\begin{aligned} β_0 &= λ \text{baseline}(Female) + ρ \text{baseline}(Male)\\ β_1 &= (1-λ) \text{baseline}(Female)\\ β_2 &= (1-ρ) \text{baseline}(Male) \end{aligned}\] In other words, we cannot interpret the coefficients
Maybe the weight depends also on handedness
sexFemale sexMale handRight
1 0 1 1
2 1 0 1
4 0 1 0
5 1 0 0
Here we assume that hand is independent of sex
But what if they interact?
Maybe left-handed males are heavier
sexFemale:handLeft sexMale:handLeft sexFemale:handRight sexMale:handRight
1 0 0 0 1
2 0 0 1 0
4 0 1 0 0
5 1 0 0 0
:
means interactionAs we saw, the expression sex:hand
creates four columns
sexFemale:handLeft
which is 1 when sex
is Female and hand
is LeftsexMale:handLeft
with the same ideasexFemale:handRight
idemsexMale:handRight
idemA common case is
sexFemale sexMale handRight sexMale:handRight
1 0 1 1 1
2 1 0 1 0
4 0 1 0 0
5 1 0 0 0
which can also be written as
sexFemale sexMale handRight sexMale:handRight
1 0 1 1 1
2 1 0 1 0
4 0 1 0 0
5 1 0 0 0
In this model, what is the interpretation of
sexFemale
sexMale
handRight
sexMale:handRight
What about the interaction between sex and height
sexFemale:height sexMale:height
1 0 179
2 168 0
4 0 170
5 162 0
What is the interpretation here?
Now the model is \[y_i = β_1 f_i h_i + β_2 m_i h_i + e_i\] where \(m_i\) is 1 for Male and 0 for Female, \(f_i\) is 1 for Female and 0 for Male, and \(h_i\) is the weight, then \[\begin{aligned}
β_1 &= \text{slope}(Height|Female)\\
β_2 &= \text{slope}(Height|Male)
\end{aligned}\] It does not have intercept
It does not make too much sense (unless we center the data)
sexFemale sexMale sexFemale:height sexMale:height
1 0 1 0 179
2 1 0 168 0
4 0 1 0 170
5 1 0 162 0
What is the interpretation here?