- Numeric values (e.g. height)
- Factors (e.g. sex, age, diet, stress, tissue)
- Sum of numeric and factor (e.g. height + sex)
- Often we have an
*Intercept*, but it is optional

The R command `model.matrix`

transforms a formula into a matrix

Internally R uses it to prepare the linear model

The truth is that linear models only work with numbers, but we can *represent* other things with numbers

We will use this example data

```
sex height weight hand
1 Male 179 67 Right
2 Female 168 55 Right
4 Male 170 74 Left
5 Female 162 68 Left
```

which can be modelled as this

```
(Intercept) sexMale height
1 1 1 179
2 1 0 168
4 1 1 170
5 1 0 162
```

If the model is \[y_i = β_0 + β_1 s_i + e_i\] where \(s_i\) is 1 for *Male* and 0 for *Female*, then \[\begin{aligned}
β_0 &= \text{mean}(Female)\\
β_1 &= \text{mean}(Male)-\text{mean}(Female)
\end{aligned}\]

If the model is \[y_i = β_0 + β_1 s_i + β_2 h_i + e_i\] where \(h_i\) is the weight of person \(i\), then \[\begin{aligned} β_0 &= \text{baseline}(Female)\\ β_1 &= \text{baseline}(Male)-\text{baseline}(Female)\\ β_2 &= \text{slope}(Height) \end{aligned}\]

Now we have independent *Male* and *Female*

```
sexFemale sexMale height
1 0 1 179
2 1 0 168
4 0 1 170
5 1 0 162
```

Now the model is \[y_i = β_1 f_i + β_2 m_i + β_3 h_i + e_i\] where \(m_i\) is 1 for *Male* and 0 for *Female*, \(f_i\) is 1 for *Female* and 0 for *Male*, and \(h_i\) is the weight, then \[\begin{aligned}
β_1 &= \text{baseline}(Female)\\
β_2 &= \text{baseline}(Male)\\
β_3 &= \text{slope}(Height)
\end{aligned}\]

Notice that we have either

- An intercept and an indicator for
*Male* - An indicator for
*Male*and another for*Female*

But we cannot have the three at the same time

In that case the independent variables will be 100% correlated

If the model was \[y_i = β_0 + β_1 f_i + β_2 m_i + e_i\] then, for any values \(0≤λ≤1\) and \(0≤ρ≤1,\) we have \[\begin{aligned} β_0 &= λ \text{baseline}(Female) + ρ \text{baseline}(Male)\\ β_1 &= (1-λ) \text{baseline}(Female)\\ β_2 &= (1-ρ) \text{baseline}(Male) \end{aligned}\] In other words, we cannot interpret the coefficients

Maybe the weight depends also on *handedness*

```
sexFemale sexMale handRight
1 0 1 1
2 1 0 1
4 0 1 0
5 1 0 0
```

Here we assume that *hand* is independent of *sex*

But what if they interact?

Maybe *left-handed males* are heavier

```
sexFemale:handLeft sexMale:handLeft sexFemale:handRight sexMale:handRight
1 0 0 0 1
2 0 0 1 0
4 0 1 0 0
5 1 0 0 0
```

`:`

means As we saw, the expression `sex:hand`

creates four columns

`sexFemale:handLeft`

which is 1 when`sex`

is*Female***and**`hand`

is*Left*`sexMale:handLeft`

with the same idea`sexFemale:handRight`

idem`sexMale:handRight`

idem

A common case is

```
sexFemale sexMale handRight sexMale:handRight
1 0 1 1 1
2 1 0 1 0
4 0 1 0 0
5 1 0 0 0
```

which can also be written as

```
sexFemale sexMale handRight sexMale:handRight
1 0 1 1 1
2 1 0 1 0
4 0 1 0 0
5 1 0 0 0
```

In this model, what is the interpretation of

`sexFemale`

`sexMale`

`handRight`

`sexMale:handRight`

What about the interaction between sex and height

```
sexFemale:height sexMale:height
1 0 179
2 168 0
4 0 170
5 162 0
```

What is the interpretation here?

Now the model is \[y_i = β_1 f_i h_i + β_2 m_i h_i + e_i\] where \(m_i\) is 1 for *Male* and 0 for *Female*, \(f_i\) is 1 for *Female* and 0 for *Male*, and \(h_i\) is the weight, then \[\begin{aligned}
β_1 &= \text{slope}(Height|Female)\\
β_2 &= \text{slope}(Height|Male)
\end{aligned}\] It does not have *intercept*

It does not make too much sense (unless we *center* the data)

```
sexFemale sexMale sexFemale:height sexMale:height
1 0 1 0 179
2 1 0 168 0
4 0 1 0 170
5 1 0 162 0
```

What is the interpretation here?

- Interactions create new variables in the linear model
- Chosen wisely, they will tell you what you want to know
- You can
*compare*them later, using*contrasts*