Choosing “the best” representative depends on the way we measure “how bad is it”
Once we choose an error function, we look for the value that gives the smallest error
(we say it minimizes the error function)
Median minimizes the absolute error
Mean minimizes the squared error
We found that the average \(\bar{𝐲}\) is the value \(β\) that minimizes the squared error \[\mathrm{SE}_𝐲 (β)=\sum_i (y_i-β)^2\] This is our initial measure of “quality of representative”
Larger values of squared error are bad
Why makes the squared error to be large?
The first part is good, it is what we want
But the second is unfortunate
How can we correct it?
To compensate, we divide by the number of values \[\mathrm{MSE}_𝐲 (β)=\frac 1 n \sum_i (y_i-β)^2\]
The smallest MSE is achieved when \(β\) is the mean \(\bar{𝐲}\) \[\text{Smallest } \mathrm{MSE}_𝐲 (\bar{𝐲})=\frac 1 n \sum_i (y_i-\bar{𝐲})^2\]
This value is called variance of \(𝐲\)
\[\begin{aligned} \mathrm{var}(𝐲)&=\frac 1 n \sum_i (y_i-\bar{𝐲})^2=\frac 1 n \sum_i (y_i^2-2\bar{𝐲}y_i+ \bar{𝐲}^2)\\ &=\frac 1 n \sum_i y_i^2-2\bar{𝐲}\frac 1 n \sum_i y_i+ \bar{𝐲}^2\frac 1 n \sum_i 1\\ &=\frac 1 n \sum_i y_i^2-2\bar{𝐲}\bar{𝐲}+ \bar{𝐲}^2\frac 1 n n\\ &=\frac 1 n \sum_i y_i^2-2\bar{𝐲}^2+ \bar{𝐲}^2\\ &=\frac 1 n \sum_i y_i^2-\bar{𝐲}^2\\ \end{aligned}\]
\[\mathrm{var}(𝐲)=\frac 1 n \sum_i (y_i-\bar{𝐲})^2=\frac 1 n \sum_i y_i^2-\bar{𝐲}^2\]
“The average of the squares minus the square of the average”
We saw that \[\frac 1 n \sum_i y_i^2≥\bar{𝐲}^2\] Therefore we always have \[\frac 1 n \sum_i y_i^2-\bar{𝐲}^2≥0\]
The units of the variance are squared
If \(𝐲\) is in meters, then \(\mathrm{var}(𝐲)\) is in squared meters
Often it is better to use the original units
In that case we use the standard deviation
\[\mathrm{sdev}(𝐲)=\sqrt{\mathrm{var}(𝐲)}\]
All values \(y_i\) are multiplied by a fixed constant \(k\)
\[\begin{aligned} \mathrm{var}(k⋅𝐲) &= k^2⋅\mathrm{var}(𝐲)\\ \mathrm{sdev}(k⋅𝐲) &= k⋅\mathrm{sdev}(𝐲) \end{aligned}\]
Multiplicative constants increase the variance quadratically
Standard deviation increases in direct proportion
\[\begin{aligned} \mathrm{var}(𝐱+𝐲)&=\frac 1 n \sum_i (x_i+ y_i-\bar{𝐱}-\bar{𝐲})^2\\ &=\frac 1 n \sum_i ((x_i-\bar{𝐱})+ (y_i-\bar{𝐲}))^2\\ &=\frac 1 n \sum_i \left((x_i-\bar{𝐱})^2 +(y_i-\bar{𝐲})^2+ 2(x_i-\bar{𝐱})(y_i-\bar{𝐲})\right)\\ &=\frac 1 n \sum_i (x_i-\bar{𝐱})^2 +\frac 1 n \sum_i (y_i-\bar{𝐲})^2+ 2\frac 1 n \sum_i (x_i-\bar{𝐱})(y_i-\bar{𝐲})\\ &=\mathrm{var}(𝐱) +\mathrm{var}(𝐲)+ 2\frac 1 n \sum_i (x_i-\bar{𝐱})(y_i-\bar{𝐲}) \end{aligned}\]
The expression \[\frac 1 n \sum_i (x_i-\bar{𝐱})(y_i-\bar{𝐲})\] is called covariance of \(𝐱\) and \(𝐲\)
We write it as \[\mathrm{cov}(𝐱,𝐲)\]
\[ \mathrm{var}(𝐱+𝐲)=\mathrm{var}(𝐱) +\mathrm{var}(𝐲)+ 2\mathrm{cov}(𝐱,𝐲) \]
The variance of the sum is the sum of the variances plus twice the covariance
\[\begin{aligned} \frac 1 n \sum_i (x_i-\bar{𝐱})(y_i-\bar{𝐲})&=\frac 1 n \sum_i (x_i y_i-\bar{𝐱}y_i+x_i\bar{𝐲}-\bar{𝐱}\bar{𝐲})\\ &=\frac 1 n \sum_i x_i y_i-\frac 1 n \sum_i\bar{𝐱}y_i-\frac 1 n \sum_i x_i\bar{𝐲}+\frac 1 n \sum_i\bar{𝐱}\bar{𝐲}\\ &=\frac 1 n \sum_i x_i y_i-\bar{𝐱}\frac 1 n \sum_i y_i - \bar{𝐲}\frac 1 n \sum_i x_i + \bar{𝐱}\bar{𝐲}\frac 1 n \sum_i 1\\ &=\frac 1 n \sum_i x_i y_i-\bar{𝐱}\bar{𝐲}- \bar{𝐱}\bar{𝐲}+\bar{𝐱}\bar{𝐲}\\ &=\frac 1 n \sum_i x_i y_i-\bar{𝐱}\bar{𝐲}\\ \end{aligned}\]
\[\mathrm{cov}(𝐱,𝐲)=\frac 1 n \sum_i (x_i-\bar{𝐱})(y_i-\bar{𝐲})=\frac 1 n \sum_i x_i y_i-\bar{𝐱}\bar{𝐲}\]
The second formula is easier to calculate
“The average of the products minus the product of the averages”
If \(𝐱\) and \(𝐲\) go in the same direction,
then the covariance is positive
If \(𝐱\) and \(𝐲\) go in oposite directions,
then the covariance is negative
It is easy to see that, for any constants \(a\) and \(b\), we have \[\begin{aligned} \mathrm{cov}(a\, 𝐱,𝐲)&=a\, \mathrm{cov}(𝐱,𝐲)\\ \mathrm{cov}(𝐱, b\,𝐲)&=b\, \mathrm{cov}(𝐱,𝐲)\\ \mathrm{cov}(a\, 𝐱, b\,𝐲)&=ab\, \mathrm{cov}(𝐱,𝐲)\\ \end{aligned}\] It would be nice to have a “covariance” value that is independent of the scale
One way to be independent of the scale is to use \[\mathrm{corr}(𝐱,𝐲)=\frac{\mathrm{cov}(𝐱,𝐲)}{\mathrm{sdev}(𝐱)\mathrm{sdev}(𝐲)}\] This is the correlation between \(𝐱\) and \(𝐲\)
It is always a value between -1 and 1
(The proof is long and we do not need it in this course)