November 8th, 2016

## Location

If you have to describe the vector v with a single number x, which would it be?

If we have to replace each one of v[i] for a single number, which number is “the best”?

Better choose one that is the “less wrong”

How can x be wrong?

## Measuring error

Many alternatives

• Number of errors sum(x!=v[i])
• Absolute error sum(abs(v-x))
• Squared error sum((v-x)^2)

## Absolute error

Absolute error when $$x$$ represents $$\mathbf v$$ $\mathrm{AE}(x, \mathbf{v})=\sum_i |v_i-x|$ or, in R code

sum(abs(v-x))

Which $$x$$ minimizes absolute error?

## Absolute error

We get the minimum absolute error when $$x=425$$

## Median

If x is the median of v, then

• half of the values in v are smaller than x
• half of the values in v are bigger than x

The median minimizes the absolute error

## Squared error

The squared error when $$x$$ represents $$\mathbf v$$ is $\mathrm{SE}(x, \mathbf{v})=\sum_i (v_i-x)^2$ or, in R code

sum((v-x)^2)

Which $$x$$ minimizes the squared error?

## Squared error

We get the minimum squared error when $$x=591.1843972$$

## Minimizing SE using math

The error is $\mathrm{SE}(x, \mathbf{v})=\sum_i (v_i-x)^2$

To find the minimal value we can take the derivative of $$SE$$ with respect to $$x$$

$\frac{d}{dx} \mathrm{SE}(x, \mathbf{v})= 2\sum_i (v_i - x)= 2\sum_i v_i - 2nx$

The minimal values of functions are located where the derivative is zero

## Minimizing SE using math

Now we find the value of $$x$$ that makes the derivative equal to zero.

$\frac{d}{dx} \mathrm{SE}(x, \mathbf{v})= 2\sum_i v_i - 2nx$

Making this last formula equal to zero and solving for $$x$$ we found that the best one is

$x = \frac{1}{n} \sum_i v_i$

## Arithmetic Mean

The mean value of $$\mathbf v$$ is $\text{mean}(\mathbf v) = \frac{1}{n}\sum_{i=1}^n v_i$ where $$n$$ is the length of the vector $$\mathbf v$$.

Sometimes it is written as $$\bar{\mathbf v}$$

This value is called mean

In R we write mean(v)

## In summary

summary(rivers)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max.
135.0   310.0   425.0   591.2   680.0  3710.0 

What are these values?

## Minimum, Maximum and Range

The easiest to understand are minimum and maximum

min(rivers)
[1] 135
max(rivers)
[1] 3710

Which sometimes can be useful together

range(rivers)
[1]  135 3710

## Quartiles

Quart means one fourth in latin.

If we split the set of values in four subsets of the same size

Which are the limits of these sets?

$$Q_0$$: Zero elements are smaller than this one
$$Q_1$$: One quarter of the elements are smaller
$$Q_2$$: Two quarters (half) of the elements are smaller
$$Q_3$$: Three quarters of the elements are smaller
$$Q_4$$: Four quarters (all) of the elements are smaller

It is easy to see that $$Q_0$$ is the minimum, $$Q_2$$ is the median, and $$Q_4$$ is the maximum

## Quartiles and Quantiles

Generalizing, we can ask, for each percentage $$p$$, which is the value on the vector v which is greater than $$p$$% of the rest of the values.

The function in R for that is called quantile()

By default it gives us the quartiles

quantile(rivers)
  0%  25%  50%  75% 100%
135  310  425  680 3710 
quantile(rivers, seq(0, 1, by=0.1))
  0%  10%  20%  30%  40%  50%  60%  70%  80%  90% 100%
135  255  291  330  375  425  505  610  735 1054 3710