December 10, 2018

Modeling free fall

Why we did this experiment

  • We need real data to produce science
  • Physics experiments are faster and cheaper than biology experiments
  • The logic is the same anyway
  • We care about the way of thinking

Received data

  • Nobody uploaded their video. No raw data

  • 16 people sent a text file as requested:
    • Ahmet Bahçeci, Arda Keleş, Aslıhan Gizem Bilgin, Aziz Erden, Berfin Tunç, Betül Budak, Evrim Özdemir, Gulsum Mikayilova, Meral Gök, Oguzhan Yeni, Samet Güler, Sude Dinçer, Sümeyye Onat, Zehra Kezer, Özlem Bakangil, Şule İnan
  • There are 14 different text files. There are two cheating groups
    • One with 2 people
    • One with 3 people

Summary of data is online

Summary of the video

The data

fall <- read.table("free-fall.txt", header=TRUE)
fall
   msec position replica
1  2355       10       A
2  2457       20       A
3  2533       30       A
4  2558       40       A
5  2639       50       A
6  2690       60       A
7  2742       70       A
8  2742       80       A
9  2797       90       A
10 1837       10       B
11 1970       20       B
12 2022       30       B
13 2104       40       B
14 2129       50       B
15 2210       60       B
16 2237       70       B
17 2262       80       B
18 2313       90       B

First view

plot(position~msec,
     data=fall)

  • Initial time is not zero
  • position is upside down
  • We can change these values for the analysis

Time should start at zero

  • We started the timer before dropping the ball
  • The first data is when the ball crosses 10
  • We can choose this as our time reference
  • The ball is already moving
    • Initial speed is not zero
  • The column name is msec. It means milliseconds

Normalizing time data

Let’s add a new column, called time, with the correct value of seconds

fall$time <- fall$msec/1000

Now, for each replica, we change time to start at 0

old <- fall$time[fall$replica=="A"]
fall$time[fall$replica=="A"] <-  old - min(old)

old <- fall$time[fall$replica=="B"]
fall$time[fall$replica=="B"] <-  old - min(old)

Normalizing position data

We should also transform position to meters. The first data should be at 0

Each black/white strip was a little more than 19 centimeters

fall$height <- fall$position/10 * 0.192
old <- fall$height[fall$replica=="A"]
fall$height[fall$replica=="A"] <-  min(old) - old

old <- fall$height[fall$replica=="B"]
fall$height[fall$replica=="B"] <-  min(old) - old

The tidy data

fall[fall$replica=="A",]
  msec position replica  time height
1 2355       10       A 0.000  0.000
2 2457       20       A 0.102 -0.192
3 2533       30       A 0.178 -0.384
4 2558       40       A 0.203 -0.576
5 2639       50       A 0.284 -0.768
6 2690       60       A 0.335 -0.960
7 2742       70       A 0.387 -1.152
8 2742       80       A 0.387 -1.344
9 2797       90       A 0.442 -1.536

The tidy data

fall[fall$replica=="B",]
   msec position replica  time height
10 1837       10       B 0.000  0.000
11 1970       20       B 0.133 -0.192
12 2022       30       B 0.185 -0.384
13 2104       40       B 0.267 -0.576
14 2129       50       B 0.292 -0.768
15 2210       60       B 0.373 -0.960
16 2237       70       B 0.400 -1.152
17 2262       80       B 0.425 -1.344
18 2313       90       B 0.476 -1.536

Normalized data

plot(height~time, data=fall, pch=as.character(replica))

We will use a polinomial model

This is a parabola, which can be modeled with a second degree polynomial \[y=a + b\cdot x + c\cdot x^2\] When we use a polynomial model, we need to add new columns to our data frame

fall$time_sq <- fall$time^2

This is used only for the model

Fitting the model (replica “A”)

model_A <- lm(height ~ time + time_sq, 
            data=fall, subset=replica=="A")
model_A
Call:
lm(formula = height ~ time + time_sq, data = fall, subset = replica == 
    "A")

Coefficients:
(Intercept)         time      time_sq  
  -0.001309    -1.557507    -4.266640  

Fitting the model (replica “B”)

model_B <- lm(height ~ time + time_sq, 
            data=fall, subset=replica=="B")
model_B
Call:
lm(formula = height ~ time + time_sq, data = fall, subset = replica == 
    "B")

Coefficients:
(Intercept)         time      time_sq  
   0.003772    -1.083734    -4.570787  

(Intercept) should be 0

Since we choose that at time=0 we have height=0, we know that (Intercept) is 0

In other words, our model here is \[y= b\cdot x + c\cdot x^2\] The old \(a\) has a fixed value of 0

How do we do that in R?

Fixing the intercept to 0

new_model_A <- lm(height ~ time + time_sq + 0, 
            data=fall, subset=replica=="A")
new_model_A
Call:
lm(formula = height ~ time + time_sq + 0, data = fall, subset = replica == 
    "A")

Coefficients:
   time  time_sq  
 -1.568   -4.249  

Fixing the intercept to 0 (B)

new_model_B <- lm(height ~ time + time_sq + 0, 
            data=fall, subset=replica=="B")
new_model_B
Call:
lm(formula = height ~ time + time_sq + 0, data = fall, subset = replica == 
    "B")

Coefficients:
   time  time_sq  
 -1.057   -4.612  

Meaning of the coefficients

If you remember your physics lessons, you can recognize that in the formula \[y= b\cdot x + c\cdot x^2\] the value \(b\) is the initial speed, and \(c=-g/2\)

Thus, we can find the gravitational acceleration in the second coefficient

Combining both replicas

The gravitational acceleration is constant and independent of the replica

But the initial speed depends on the replica

We want to combine both replicas to get

  • initial speed of each replica
  • gravitational constant

Models with factors

combi <- lm(height ~ time:replica + time_sq + 0, 
            data=fall)
combi
Call:
lm(formula = height ~ time:replica + time_sq + 0, data = fall)

Coefficients:
      time_sq  time:replicaA  time:replicaB  
       -4.456         -1.493         -1.117  

The symbol : means interaction

Your data

fall-raw.txt in pictures

18 replicas in total

Normalized data

Modeling each replica separately

    time time_sq
A -0.460   -5.19
B -0.318   -4.74
C -1.417   -4.65
D -0.761   -4.25
E -1.640   -3.07
F -1.545   -4.03
G -0.354   -5.67
H -0.568   -4.27
I -1.519   -4.62
    time time_sq
J -0.880   -5.07
K -1.433   -3.31
L -0.822   -4.04
M -0.460   -5.19
N -0.281   -4.81
O -1.292   -4.97
P -1.110   -5.25
Q -2.248   -3.60
R -2.290   -3.40

How different are the coefficients?

Combining everything

combi <- lm(position ~ time:replica + time_sq + 0, 
            data=fall)
coef(combi)
      time_sq time:replicaA time:replicaB 
       -4.421        -0.778        -0.457 
time:replicaC time:replicaD time:replicaE 
       -1.501        -0.688        -1.058 
time:replicaF time:replicaG time:replicaH 
       -1.376        -0.941        -0.488 
time:replicaI time:replicaJ time:replicaK 
       -1.590        -1.129        -0.979 
time:replicaL time:replicaM time:replicaN 
       -0.657        -0.778        -0.452 
time:replicaO time:replicaP time:replicaQ 
       -1.490        -1.411        -1.974 
time:replicaR 
       -1.947 

Summary

  • Linear models are useful to extract scientific information from experimental data
  • Can handle linear, exponential, power law and polinomial models
  • Using +0 in the formula forces the intercept to 0
  • Using : allows us to combine two variables and measure their interaction

Application

With survey data

survey <- read.table("survey1-tidy.txt")
colnames(survey)
[1] "Gender"       "birth_day"    "birth_month" 
[4] "birth_year"   "height_cm"    "weight_kg"   
[7] "handness"     "hand_span_cm"

Boys weight_kg more than girls

plot(weight_kg~Gender,
    data=survey)

lm(weight_kg~Gender,
    data=survey)
Call:
lm(formula = weight_kg ~ Gender, data = survey)

Coefficients:
(Intercept)   GenderMale  
       57.8         18.7  

Models with factors

model <- lm(weight_kg ~ Gender, data = survey)
Call:
lm(formula = weight_kg ~ Gender, data = survey)

Coefficients:
(Intercept)   GenderMale  
       57.8         18.7  

Here (Intercept) is “normal weight”, and GenderMale is the extra weight when Gender=='Male' (on average)

\[\text{weight_kg} = coef(model)[1] + coef(model)[2]\cdot\text{[Gender is 'Male']}\]

Model with factor, without intercept

lm(weight_kg ~ Gender + 0, data = survey)
Call:
lm(formula = weight_kg ~ Gender + 0, data = survey)

Coefficients:
GenderFemale    GenderMale  
        57.8          76.6  

Now the formula includes +0. This means “do not use intercept”. Now

  • GenderMale is the average weight when Gender=='Male'
  • GenderFemale is the average weight when Gender=='Female'

Weight v/s Height

Let’s start with the simplest model

plot(weight_kg ~ height_cm, data=survey, col=Gender)
model1 <- lm(weight_kg ~ height_cm, data=survey)
points(predict(model1) ~ height_cm, data=survey, col="green3", pch=16)

Interpretation

model1
Call:
lm(formula = weight_kg ~ height_cm, data = survey)

Coefficients:
(Intercept)    height_cm  
    -81.504        0.862  

Including the Gender factor

Obviously the weight also depends on the gender

plot(weight_kg ~ height_cm, data=survey, col=Gender)
model2 <- lm(weight_kg ~ height_cm + Gender+0, data = survey)
points(predict(model2) ~ height_cm, data=survey, col="green3", pch=16)

Interpretation

model2
Call:
lm(formula = weight_kg ~ height_cm + Gender + 0, data = survey)

Coefficients:
   height_cm  GenderFemale    GenderMale  
       0.279        11.688        26.896  

There are two lines with the same slope: is the coefficient for height_cm

The “intercept” of each line are the coefficients GenderFemale and GenderMale

Summary

  • Factors can be used in formulas
  • The intercept can be taken out using +0 in the formula
  • Variables can have interactions (synergy)
    • Use : to indicate interaction
  • Homework: Learn about * in formulas