For this part we will use the following data

https://dry-lab.org/static/students.txt

These are self-reported values from former students

Load in your spreadsheet and plot

**weight_kg**versus**height_cm****weight_kg**versus**hand_span**

We want two models for **weight_kg**

- One based in
**height_cm**\[\text{weight_kg} = β_0 + β_1⋅\text{height_cm}\] - One based in
**hand_span**\[\text{weight_kg} = β_0 + β_1⋅\text{hand_span}\]

Obviously, \(β_0\) and \(β_1\) on each model are independent. This
time we can use `SLOPE()`

and `INTERCEPT()`

.
Please do it now

We have tools to evaluate the quality of the model

Please use them to evaluate both models

Also, evaluate the sum of squared errors for each model

Besides `SLOPE()`

and `INTERCEPT()`

, there is
another function

We can get both at the same time using `LINEST()`

This is an advanced function that returns several values

It has some optional parameters that we will study later

Please use it and compare with the previous results

We have two ways to predict **weight_kg**

We want to combine them

\[\text{weight_kg} = β_0 + β_1⋅\text{hand_span} + β_2⋅\text{height_cm}\]

Now we have to find three parameters

**What are the conditions for these parameters?How can we
find them?**

The formula for \((β_0, β_1, β_2)\) is complex. We will not teach it

Instead, we will use the `LINEST()`

function

This is the main advantage of `LINEST()`

`LINEST()`

can model using several independent
variables

Evaluate Mean Squared Error and \(R^2\) for the large model

Which one is the better now?

YouTube has an option to show the video at 0.25 speed. Try it

We got time and position from the video

You can get the data at

https://dry-lab.org/static/2020/cmb1/free-fall.csv

There is another version using comma as decimal separator

https://dry-lab.org/static/2020/cmb1/free-fall-eu.csv

Read them in your spreadsheet

Plot the data directly

Then in semi–log scale

Then in log–log scale

Which one would you choose?

There are many possible functions that connect \(x\) and \(y\)

We will try a *polynomial* formula

A polynomial is a formula like \[y=β_0 + β_1\cdot x + β_2\cdot x^2 +\cdots+ β_n\cdot x^n\]

The highest exponent is the **degree** of the
polynomial

We will try a 2-degree formula \[y = β_0 + β_1\cdot x + β_2\cdot x^2\]

\[\text{distance}=β_0 + β_1\cdot\text{time}+β_2\cdot\text{time}^2\] This is a parabola, which can be modeled with a second degree polynomial

When we use a polynomial model, we need to add new columns

In this case we need to add **time_sq** containing
**time ^{2}**

This is used **only** for the model

- In the video the timer starts before dropping the ball
- The first data is when the ball crosses
**10** - We choose this as our time reference
- The ball is already moving. Speed is not zero
- Initial speed is not zero

Since we choose that at **time=0** we have
**distance=0**, we **know** that \(β_0\) is 0

In other words, our model here is \[y =β_1\cdot x + β_2\cdot x^2\]

The old \(β_0\) has a fixed value of 0

How do we do that in Excel?

If you remember your physics lessons, you can recognize that in the
formula \[y =β_1\cdot x + β_2\cdot
x^2\] the value \(β\) is the
*initial speed*, and \(β_2=-g/2\)

Thus, we can measure the *gravitational acceleration* in the
second coefficient