To improve our understanding of t-test and ANOVA in linear models, we can use simulation.

First, we need a function to create a data frame of random values. We can call it

`create_random_data(n)`

. The input`n`

indicates the number of rows. It must return a data frame with 3 columns called`x1`

,`x2`

, and`y`

. The values should be chosen randomly following a Normal distribution with mean zero and variance 1.Then we need a function that takes a data frame as input, and returns a vector with values taken from

`summary(lm(y ~ x1 + x2, data))`

. In particular we want to get:- the coefficients predicted by the linear model. This is the first
column of the field
`coefficients`

of the output of summary. They will be`(Intercept)`

,`x1`

and`x2`

. I would like to call them \(β_0,β_1,β_2,\) or at least`B0`

,`B1`

,`B2`

.’ - the
*t-values*predicted by the linear model. This is the third column of the field`coefficients`

of the output of summary. Let’s call them`t0`

,`t1`

,`t2`

. - The
*p-values*predicted by the linear model. This is the fourth column of the field`coefficients`

of the output of summary. Let’s call them`p0`

,`p1`

,`p2`

. - The F statistic and the degrees of freedom, taken from the field
`fstatistic`

of the output of summary. We call them`f`

,`df1`

,`df2`

.

- the coefficients predicted by the linear model. This is the first
column of the field
Now we want to make several hundreds of replicas of the full process of generating a random data frame, building a linear model on it, and getting the relevant parameters from the model. We can use

`n=3`

initially. We collect all results in a data frame, one row for each simulation, one column for each of the 12 parameters.We add an extra column

`pval`

, with the p-value for the F statistics. We need this calculation because`summary()`

does not provide it for us.Finally we plot

`B1`

versus`B2`

using color depending on the significance of`p1`

,`p2`

, or`pval`

, respectively. We can draw similar plots using`t1`

versus`t2`

for the \(x\) and \(y\) position.

We will discuss the results in classes.