For this part we will use the following data
https://dry-lab.org/static/students.txt
These are self-reported values from former students
Load in your spreadsheet and plot
We want two models for weight_kg
Obviously, \(β_0\) and \(β_1\) on each model are independent. This
time we can use SLOPE()
and INTERCEPT()
.
Please do it now
We have tools to evaluate the quality of the model
Please use them to evaluate both models
Also, evaluate the sum of squared errors for each model
Besides SLOPE()
and INTERCEPT()
, there is
another function
We can get both at the same time using LINEST()
This is an advanced function that returns several values
It has some optional parameters that we will study later
Please use it and compare with the previous results
We have two ways to predict weight_kg
We want to combine them
\[\text{weight_kg} = β_0 + β_1⋅\text{hand_span} + β_2⋅\text{height_cm}\]
Now we have to find three parameters
What are the conditions for these parameters?
How can we
find them?
The formula for \((β_0, β_1, β_2)\) is complex. We will not teach it
Instead, we will use the LINEST()
function
This is the main advantage of LINEST()
LINEST()
can model using several independent
variables
Evaluate Mean Squared Error and \(R^2\) for the large model
Which one is the better now?
YouTube has an option to show the video at 0.25 speed. Try it
We got time and position from the video
You can get the data at
https://dry-lab.org/static/2020/cmb1/free-fall.csv
There is another version using comma as decimal separator
https://dry-lab.org/static/2020/cmb1/free-fall-eu.csv
Read them in your spreadsheet
Plot the data directly
Then in semi–log scale
Then in log–log scale
Which one would you choose?
There are many possible functions that connect \(x\) and \(y\)
We will try a polynomial formula
A polynomial is a formula like \[y=β_0 + β_1\cdot x + β_2\cdot x^2 +\cdots+ β_n\cdot x^n\]
The highest exponent is the degree of the polynomial
We will try a 2-degree formula \[y = β_0 + β_1\cdot x + β_2\cdot x^2\]
\[\text{distance}=β_0 + β_1\cdot\text{time}+β_2\cdot\text{time}^2\] This is a parabola, which can be modeled with a second degree polynomial
When we use a polynomial model, we need to add new columns
In this case we need to add time_sq containing time2
This is used only for the model
Since we choose that at time=0 we have distance=0, we know that \(β_0\) is 0
In other words, our model here is \[y =β_1\cdot x + β_2\cdot x^2\]
The old \(β_0\) has a fixed value of 0
How do we do that in Excel?
If you remember your physics lessons, you can recognize that in the formula \[y =β_1\cdot x + β_2\cdot x^2\] the value \(B\) is the initial speed, and \(β_2=-g/2\)
Thus, we can measure the gravitational acceleration in the second coefficient