December 10, 2018

- We need real data to produce science
- Physics experiments are faster and cheaper than biology experiments
- The logic is the same anyway
- We care about the
*way of thinking*

Nobody uploaded their video. No raw data

- 16 people sent a text file as requested:
- Ahmet Bahçeci, Arda Keleş, Aslıhan Gizem Bilgin, Aziz Erden, Berfin Tunç, Betül Budak, Evrim Özdemir, Gulsum Mikayilova, Meral Gök, Oguzhan Yeni, Samet Güler, Sude Dinçer, Sümeyye Onat, Zehra Kezer, Özlem Bakangil, Şule İnan

- There are 14 different text files. There are two cheating groups
- One with 2 people
- One with 3 people

The summary of all your data is at

https://anaraven.bitbucket.io/static/2018/cmb1/fall-raw.txt

I also analyzed my video. My data is at

https://anaraven.bitbucket.io/static/2018/cmb1/free-fall.txt

fall <- read.table("free-fall.txt", header=TRUE) fall

msec position replica 1 2355 10 A 2 2457 20 A 3 2533 30 A 4 2558 40 A 5 2639 50 A 6 2690 60 A 7 2742 70 A 8 2742 80 A 9 2797 90 A 10 1837 10 B 11 1970 20 B 12 2022 30 B 13 2104 40 B 14 2129 50 B 15 2210 60 B 16 2237 70 B 17 2262 80 B 18 2313 90 B

plot(position~msec, data=fall)

- Initial time is not zero
*position*is upside down- We can change these values for the analysis

- We started the timer before dropping the ball
- The first data is when the ball crosses
**10** - We can choose this as our time reference
- The ball is already moving
- Initial speed is not zero

- The column name is
`msec`

. It means**milliseconds**

Let’s add a new column, called `time`

, with the correct value of *seconds*

fall$time <- fall$msec/1000

Now, for each replica, we change `time`

to start at 0

old <- fall$time[fall$replica=="A"] fall$time[fall$replica=="A"] <- old - min(old) old <- fall$time[fall$replica=="B"] fall$time[fall$replica=="B"] <- old - min(old)

We should also transform position to meters. The first data should be at 0

Each black/white strip was a little more than 19 centimeters

fall$height <- fall$position/10 * 0.192 old <- fall$height[fall$replica=="A"] fall$height[fall$replica=="A"] <- min(old) - old old <- fall$height[fall$replica=="B"] fall$height[fall$replica=="B"] <- min(old) - old

fall[fall$replica=="A",]

msec position replica time height 1 2355 10 A 0.000 0.000 2 2457 20 A 0.102 -0.192 3 2533 30 A 0.178 -0.384 4 2558 40 A 0.203 -0.576 5 2639 50 A 0.284 -0.768 6 2690 60 A 0.335 -0.960 7 2742 70 A 0.387 -1.152 8 2742 80 A 0.387 -1.344 9 2797 90 A 0.442 -1.536

fall[fall$replica=="B",]

msec position replica time height 10 1837 10 B 0.000 0.000 11 1970 20 B 0.133 -0.192 12 2022 30 B 0.185 -0.384 13 2104 40 B 0.267 -0.576 14 2129 50 B 0.292 -0.768 15 2210 60 B 0.373 -0.960 16 2237 70 B 0.400 -1.152 17 2262 80 B 0.425 -1.344 18 2313 90 B 0.476 -1.536

plot(height~time, data=fall, pch=as.character(replica))

This is a parabola, which can be modeled with a second degree polynomial \[y=a + b\cdot x + c\cdot x^2\] When we use a polynomial model, we need to add new columns to our data frame

fall$time_sq <- fall$time^2

This is used **only** for the model

model_A <- lm(height ~ time + time_sq, data=fall, subset=replica=="A") model_A

Call: lm(formula = height ~ time + time_sq, data = fall, subset = replica == "A") Coefficients: (Intercept) time time_sq -0.001309 -1.557507 -4.266640

model_B <- lm(height ~ time + time_sq, data=fall, subset=replica=="B") model_B

Call: lm(formula = height ~ time + time_sq, data = fall, subset = replica == "B") Coefficients: (Intercept) time time_sq 0.003772 -1.083734 -4.570787

Since we choose that at `time=0`

we have `height=0`

, we **know** that `(Intercept)`

is 0

In other words, our model here is \[y= b\cdot x + c\cdot x^2\] The old \(a\) has a fixed value of 0

How do we do that in R?

new_model_A <- lm(height ~ time + time_sq + 0, data=fall, subset=replica=="A") new_model_A

Call: lm(formula = height ~ time + time_sq + 0, data = fall, subset = replica == "A") Coefficients: time time_sq -1.568 -4.249

new_model_B <- lm(height ~ time + time_sq + 0, data=fall, subset=replica=="B") new_model_B

Call: lm(formula = height ~ time + time_sq + 0, data = fall, subset = replica == "B") Coefficients: time time_sq -1.057 -4.612

If you remember your physics lessons, you can recognize that in the formula \[y= b\cdot x + c\cdot x^2\] the value \(b\) is the *initial speed*, and \(c=-g/2\)

Thus, we can find the *gravitational acceleration* in the second coefficient

The *gravitational acceleration* is constant and independent of the replica

But the initial speed **depends** on the replica

We want to combine both replicas to get

- initial speed of each replica
- gravitational constant

combi <- lm(height ~ time:replica + time_sq + 0, data=fall) combi

Call: lm(formula = height ~ time:replica + time_sq + 0, data = fall) Coefficients: time_sq time:replicaA time:replicaB -4.456 -1.493 -1.117

The symbol `:`

means *interaction*

`fall-raw.txt`

in picturestime time_sq A -0.460 -5.19 B -0.318 -4.74 C -1.417 -4.65 D -0.761 -4.25 E -1.640 -3.07 F -1.545 -4.03 G -0.354 -5.67 H -0.568 -4.27 I -1.519 -4.62

time time_sq J -0.880 -5.07 K -1.433 -3.31 L -0.822 -4.04 M -0.460 -5.19 N -0.281 -4.81 O -1.292 -4.97 P -1.110 -5.25 Q -2.248 -3.60 R -2.290 -3.40

combi <- lm(position ~ time:replica + time_sq + 0, data=fall) coef(combi)

time_sq time:replicaA time:replicaB -4.421 -0.778 -0.457 time:replicaC time:replicaD time:replicaE -1.501 -0.688 -1.058 time:replicaF time:replicaG time:replicaH -1.376 -0.941 -0.488 time:replicaI time:replicaJ time:replicaK -1.590 -1.129 -0.979 time:replicaL time:replicaM time:replicaN -0.657 -0.778 -0.452 time:replicaO time:replicaP time:replicaQ -1.490 -1.411 -1.974 time:replicaR -1.947

- Linear models are useful to extract scientific information from experimental data
- Can handle linear, exponential, power law and polinomial models
- Using
`+0`

in the formula forces the intercept to 0 - Using
`:`

allows us to combine two variables and measure their interaction

`survey`

datasurvey <- read.table("survey1-tidy.txt") colnames(survey)

[1] "Gender" "birth_day" "birth_month" [4] "birth_year" "height_cm" "weight_kg" [7] "handness" "hand_span_cm"

plot(weight_kg~Gender, data=survey)

lm(weight_kg~Gender, data=survey)

Call: lm(formula = weight_kg ~ Gender, data = survey) Coefficients: (Intercept) GenderMale 57.8 18.7

model <- lm(weight_kg ~ Gender, data = survey)

Call: lm(formula = weight_kg ~ Gender, data = survey) Coefficients: (Intercept) GenderMale 57.8 18.7

Here `(Intercept)`

is “normal weight”, and `GenderMale`

is the extra weight when `Gender=='Male'`

(on average)

\[\text{weight_kg} = coef(model)[1] + coef(model)[2]\cdot\text{[Gender is 'Male']}\]

lm(weight_kg ~ Gender + 0, data = survey)

Call: lm(formula = weight_kg ~ Gender + 0, data = survey) Coefficients: GenderFemale GenderMale 57.8 76.6

Now the formula includes `+0`

. This means “do not use intercept”. Now

`GenderMale`

is the average weight when`Gender=='Male'`

`GenderFemale`

is the average weight when`Gender=='Female'`

Let’s start with the simplest model

plot(weight_kg ~ height_cm, data=survey, col=Gender) model1 <- lm(weight_kg ~ height_cm, data=survey) points(predict(model1) ~ height_cm, data=survey, col="green3", pch=16)

model1

Call: lm(formula = weight_kg ~ height_cm, data = survey) Coefficients: (Intercept) height_cm -81.504 0.862

Obviously the weight also depends on the gender

plot(weight_kg ~ height_cm, data=survey, col=Gender) model2 <- lm(weight_kg ~ height_cm + Gender+0, data = survey) points(predict(model2) ~ height_cm, data=survey, col="green3", pch=16)

model2

Call: lm(formula = weight_kg ~ height_cm + Gender + 0, data = survey) Coefficients: height_cm GenderFemale GenderMale 0.279 11.688 26.896

There are two lines with the same slope: is the coefficient for `height_cm`

The “intercept” of each line are the coefficients `GenderFemale`

and `GenderMale`

- Factors can be used in formulas
- The
*intercept*can be taken out using`+0`

in the formula - Variables can have interactions (synergy)
- Use
`:`

to indicate interaction

- Use
**Homework:**Learn about`*`

in formulas