December 10, 2018

## Why we did this experiment

• We need real data to produce science
• Physics experiments are faster and cheaper than biology experiments
• The logic is the same anyway
• We care about the way of thinking

• Nobody uploaded their video. No raw data

• 16 people sent a text file as requested:
• Ahmet Bahçeci, Arda Keleş, Aslıhan Gizem Bilgin, Aziz Erden, Berfin Tunç, Betül Budak, Evrim Özdemir, Gulsum Mikayilova, Meral Gök, Oguzhan Yeni, Samet Güler, Sude Dinçer, Sümeyye Onat, Zehra Kezer, Özlem Bakangil, Şule İnan
• There are 14 different text files. There are two cheating groups
• One with 2 people
• One with 3 people

## The data

fall <- read.table("free-fall.txt", header=TRUE)
fall
   msec position replica
1  2355       10       A
2  2457       20       A
3  2533       30       A
4  2558       40       A
5  2639       50       A
6  2690       60       A
7  2742       70       A
8  2742       80       A
9  2797       90       A
10 1837       10       B
11 1970       20       B
12 2022       30       B
13 2104       40       B
14 2129       50       B
15 2210       60       B
16 2237       70       B
17 2262       80       B
18 2313       90       B

## First view

plot(position~msec,
data=fall)

• Initial time is not zero
• position is upside down
• We can change these values for the analysis

## Time should start at zero

• We started the timer before dropping the ball
• The first data is when the ball crosses 10
• We can choose this as our time reference
• The ball is already moving
• Initial speed is not zero
• The column name is msec. It means milliseconds

## Normalizing time data

Let’s add a new column, called time, with the correct value of seconds

fall$time <- fall$msec/1000

Now, for each replica, we change time to start at 0

old <- fall$time[fall$replica=="A"]
fall$time[fall$replica=="A"] <-  old - min(old)

old <- fall$time[fall$replica=="B"]
fall$time[fall$replica=="B"] <-  old - min(old)

## Normalizing position data

We should also transform position to meters. The first data should be at 0

Each black/white strip was a little more than 19 centimeters

fall$height <- fall$position/10 * 0.192
old <- fall$height[fall$replica=="A"]
fall$height[fall$replica=="A"] <-  min(old) - old

old <- fall$height[fall$replica=="B"]
fall$height[fall$replica=="B"] <-  min(old) - old

fall[fall$replica=="A",]  msec position replica time height 1 2355 10 A 0.000 0.000 2 2457 20 A 0.102 -0.192 3 2533 30 A 0.178 -0.384 4 2558 40 A 0.203 -0.576 5 2639 50 A 0.284 -0.768 6 2690 60 A 0.335 -0.960 7 2742 70 A 0.387 -1.152 8 2742 80 A 0.387 -1.344 9 2797 90 A 0.442 -1.536 ## The tidy data fall[fall$replica=="B",]
   msec position replica  time height
10 1837       10       B 0.000  0.000
11 1970       20       B 0.133 -0.192
12 2022       30       B 0.185 -0.384
13 2104       40       B 0.267 -0.576
14 2129       50       B 0.292 -0.768
15 2210       60       B 0.373 -0.960
16 2237       70       B 0.400 -1.152
17 2262       80       B 0.425 -1.344
18 2313       90       B 0.476 -1.536

## Normalized data

plot(height~time, data=fall, pch=as.character(replica))

## We will use a polinomial model

This is a parabola, which can be modeled with a second degree polynomial $y=a + b\cdot x + c\cdot x^2$ When we use a polynomial model, we need to add new columns to our data frame

fall$time_sq <- fall$time^2

This is used only for the model

## Fitting the model (replica “A”)

model_A <- lm(height ~ time + time_sq,
data=fall, subset=replica=="A")
model_A
Call:
lm(formula = height ~ time + time_sq, data = fall, subset = replica ==
"A")

Coefficients:
(Intercept)         time      time_sq
-0.001309    -1.557507    -4.266640  

## Fitting the model (replica “B”)

model_B <- lm(height ~ time + time_sq,
data=fall, subset=replica=="B")
model_B
Call:
lm(formula = height ~ time + time_sq, data = fall, subset = replica ==
"B")

Coefficients:
(Intercept)         time      time_sq
0.003772    -1.083734    -4.570787  

## (Intercept) should be 0

Since we choose that at time=0 we have height=0, we know that (Intercept) is 0

In other words, our model here is $y= b\cdot x + c\cdot x^2$ The old $$a$$ has a fixed value of 0

How do we do that in R?

## Fixing the intercept to 0

new_model_A <- lm(height ~ time + time_sq + 0,
data=fall, subset=replica=="A")
new_model_A
Call:
lm(formula = height ~ time + time_sq + 0, data = fall, subset = replica ==
"A")

Coefficients:
time  time_sq
-1.568   -4.249  

## Fixing the intercept to 0 (B)

new_model_B <- lm(height ~ time + time_sq + 0,
data=fall, subset=replica=="B")
new_model_B
Call:
lm(formula = height ~ time + time_sq + 0, data = fall, subset = replica ==
"B")

Coefficients:
time  time_sq
-1.057   -4.612  

## Meaning of the coefficients

If you remember your physics lessons, you can recognize that in the formula $y= b\cdot x + c\cdot x^2$ the value $$b$$ is the initial speed, and $$c=-g/2$$

Thus, we can find the gravitational acceleration in the second coefficient

## Combining both replicas

The gravitational acceleration is constant and independent of the replica

But the initial speed depends on the replica

We want to combine both replicas to get

• initial speed of each replica
• gravitational constant

## Models with factors

combi <- lm(height ~ time:replica + time_sq + 0,
data=fall)
combi
Call:
lm(formula = height ~ time:replica + time_sq + 0, data = fall)

Coefficients:
time_sq  time:replicaA  time:replicaB
-4.456         -1.493         -1.117  

The symbol : means interaction

## Modeling each replica separately

    time time_sq
A -0.460   -5.19
B -0.318   -4.74
C -1.417   -4.65
D -0.761   -4.25
E -1.640   -3.07
F -1.545   -4.03
G -0.354   -5.67
H -0.568   -4.27
I -1.519   -4.62
    time time_sq
J -0.880   -5.07
K -1.433   -3.31
L -0.822   -4.04
M -0.460   -5.19
N -0.281   -4.81
O -1.292   -4.97
P -1.110   -5.25
Q -2.248   -3.60
R -2.290   -3.40

## Combining everything

combi <- lm(position ~ time:replica + time_sq + 0,
data=fall)
coef(combi)
      time_sq time:replicaA time:replicaB
-4.421        -0.778        -0.457
-1.501        -0.688        -1.058
time:replicaF time:replicaG time:replicaH
-1.376        -0.941        -0.488
time:replicaI time:replicaJ time:replicaK
-1.590        -1.129        -0.979
time:replicaL time:replicaM time:replicaN
-0.657        -0.778        -0.452
time:replicaO time:replicaP time:replicaQ
-1.490        -1.411        -1.974
time:replicaR
-1.947 

## Summary

• Linear models are useful to extract scientific information from experimental data
• Can handle linear, exponential, power law and polinomial models
• Using +0 in the formula forces the intercept to 0
• Using : allows us to combine two variables and measure their interaction

## With survey data

survey <- read.table("survey1-tidy.txt")
colnames(survey)
[1] "Gender"       "birth_day"    "birth_month"
[4] "birth_year"   "height_cm"    "weight_kg"
[7] "handness"     "hand_span_cm"

## Boys weight_kg more than girls

plot(weight_kg~Gender,
data=survey)

lm(weight_kg~Gender,
data=survey)
Call:
lm(formula = weight_kg ~ Gender, data = survey)

Coefficients:
(Intercept)   GenderMale
57.8         18.7  

## Models with factors

model <- lm(weight_kg ~ Gender, data = survey)
Call:
lm(formula = weight_kg ~ Gender, data = survey)

Coefficients:
(Intercept)   GenderMale
57.8         18.7  

Here (Intercept) is “normal weight”, and GenderMale is the extra weight when Gender=='Male' (on average)

$\text{weight_kg} = coef(model)[1] + coef(model)[2]\cdot\text{[Gender is 'Male']}$

## Model with factor, without intercept

lm(weight_kg ~ Gender + 0, data = survey)
Call:
lm(formula = weight_kg ~ Gender + 0, data = survey)

Coefficients:
GenderFemale    GenderMale
57.8          76.6  

Now the formula includes +0. This means “do not use intercept”. Now

• GenderMale is the average weight when Gender=='Male'
• GenderFemale is the average weight when Gender=='Female'

## Weight v/s Height

plot(weight_kg ~ height_cm, data=survey, col=Gender)
model1 <- lm(weight_kg ~ height_cm, data=survey)
points(predict(model1) ~ height_cm, data=survey, col="green3", pch=16)

## Interpretation

model1
Call:
lm(formula = weight_kg ~ height_cm, data = survey)

Coefficients:
(Intercept)    height_cm
-81.504        0.862  

## Including the Gender factor

Obviously the weight also depends on the gender

plot(weight_kg ~ height_cm, data=survey, col=Gender)
model2 <- lm(weight_kg ~ height_cm + Gender+0, data = survey)
points(predict(model2) ~ height_cm, data=survey, col="green3", pch=16)

## Interpretation

model2
Call:
lm(formula = weight_kg ~ height_cm + Gender + 0, data = survey)

Coefficients:
height_cm  GenderFemale    GenderMale
0.279        11.688        26.896  

There are two lines with the same slope: is the coefficient for height_cm

The “intercept” of each line are the coefficients GenderFemale and GenderMale

## Summary

• Factors can be used in formulas
• The intercept can be taken out using +0 in the formula
• Variables can have interactions (synergy)
• Use : to indicate interaction
• Homework: Learn about * in formulas