There are two kids of measurement uncertainties

- type A, that can be handled wit statistical tools
- type B, all others. This includes discretization uncertainty, calibration, non-linearity, hysteresis, etc.

Today we will talk only about Type A uncertainties. We assume that our instrument has a very high resolution and gives us several significant figures about the measured attribute. For the sake of the argument, we can imagine that all measurements have infinite precision.

The problem is that every time we measure, we get a different number. There are variations, in the object being measured and in the measuring device, that induce variabilities in the values. How can we extract meaningful information from noisy data? How much information can we extract?

# Theoretical Framework

We assume that there is a real fixed value \(\mu\) that we want to measure, but every measurement gives us \(\mu+\varepsilon_i.\) Here \(i\) is the sample id, that is, an index to identify each sample.

The values \(\varepsilon_i\) are
*small* ideally, but maybe not. More importantly, they are
*random*. In other words, we do not know their exact values
before the measurement. This is our second assumption.

Notice that if we know the values of \(\varepsilon_i\) before the measuring, then we can be prepared and correct them. It will be a Type B uncertainty.

Let us give them a name. The \(\varepsilon_i\) are called *statistical
errors*. Here “error” does not mean “mistake”. We did nothing wrong.
The meaning of statistical error is literally “the difference between
the real value and the measured value”.

If the errors are random, the best we can do is to describe their
probability distribution.If you need a
refresher of probabilities, we will have it soon. The third
assumption we make is that the errors somehow compensate each other.
They are sometimes positive, sometimes negative. Specifically, we assume
that their *expected value* is zero \[\mathbb E(\varepsilon_i)=0\qquad\text{for all
}i.\]

Also, we assume that they have finite variance. What do we mean with
that? Well, since the errors *follow* a probability distribution,
we can in theory calculate the variance *of the distribution* as
\[\mathbb V(\varepsilon_i)=\mathbb
E(\varepsilon_i -
\mathbb E(\varepsilon_i))^2=\sigma^2\qquad\text{for all }i.\] In
some cases, the value \(\sigma^2\) is
infinite. We will assume that it is not the case.

The last assumption that we make is that all errors are
*independent*. In other words, we cannot predict better what will
be the value of \(\varepsilon_i\) even
if we know all the values \(\varepsilon_1\) to \(\varepsilon_{i-1}.\) We do not learn from
the past errors.

## What does this mean?

First, we assume that all \(\varepsilon_i\) are created equal. The
errors do not change with time. They do not get better or worse. All the
errors are *“produced”* by a *“machine”* with fixed
characteristics. This “machine” is usually called *noise*. We can
have a good idea of how the “error generating machine” works by looking
at its products. That is, looking at the \(\varepsilon_i\) values.

These assumptions are valid for *most* of the measurements we
do, so they are not a crazy idea. Some of theses assumptions can be
relaxed and the results do not change, but they are harder to prove.

In some cases (which are more frequent in Biology than in Physics) these assumptions are not true. So it is a good idea to see if we can verify if these assumptions hold in our experiments.

# Measuring the errors

Here is the dilemma. To understand what happens we need to know the values \(\mu\) and \(\sigma.\) But we do not know them. How can we solve this?

We will use the classical strategy. First, we will assume that we know \(\mu\) and \(\sigma,\) and we will see what can we expect from our experiment. Then, in the second part, we will go in the opposite direction, and find what can we know about \(\mu\) and \(\sigma,\) given the results of our experiments.

## Doing the experiment

We measure \(N\) times, and we get the values \(x_i\) for \(i\) in \(1,\ldots N.\) What can we expect?

- Each \(x_i\) has the value \(\mu+\varepsilon_i.\) They are random, because the \(\varepsilon_i\) are random.
- The experimental results \(x_i\) should not be “far away” from the real value \(\mu.\)
- Without any further hypothesis, we can say that We can say this because it is guaranteed by the
*Tchebyshev inequality*. Notice that this is different from the 68-95-99 rule, which we will explain below.- at least 3/4 of the results lie in the interval with endpoints \(μ±2σ\)
- at least 8/9 of the results lie in the interval with endpoints μ±3σ;
- at least \(1−1/k2\) of the results lie in the interval with Here \(\sigma=\sqrt{\sigma^2}=\sqrt{\mathbb V(\varepsilon_i)}\) endpoints \(μ±kσ\) for populations, where \(k\) is any positive whole number that is greater than 1.

It is not so much. Let us see if we can do better by combining all the results.

## Combining the results

It is convenient to sum all results. We get a new value, which we can write as \(\Sigma x_i.\) This sum takes all values of \(i\) between \(1\) and \(N\), but we should not care very much about the sum limits for now.

The value \(\Sigma x_i\) is also
*random*, because is depends on the values \(x_i\) that are random. Therefore we can ask
what is its expected value and variance. This is easy, because The expected value of the sum is the sum of the
expected values. \[\mathbb E(\Sigma
x_i)=\sum \mathbb E(x_i)=\sum\mathbb E(\mu +\varepsilon_i)=\sum\mu
+\sum\mathbb E(\varepsilon_i)=N\mu.\] We get \(N\mu\) because we added \(N\) values, and all the noise cancelled.
Since \(\mu\) is not random, it is its
own expected value.

Calculating the variance of \(\Sigma
x_i\) is not hard, if we use the assumption that all errors are
independent. Thus The variance of the sum is
the sum of the variances **if the variables are
independent**. \[\mathbb
V(\Sigma x_i)=\sum \mathbb V(x_i)=\sum\mathbb V(\mu +\varepsilon_i)=\sum
0 +\sum\mathbb V(\varepsilon_i)=N\sigma^2.\] This time we get the
variance from the errors, and nothing from \(\mu.\) Since \(\mu\) is not random, its variance is 0.

What have we gained? Well, there is another famous theorem that help
us. The Central Limit Theorems says that \(\Sigma x_i\) follows a *Normal*
distribution. And this is crucial. Before we had no idea about the
distribution of each \(x_i.\) It could
have any shape, and we have very few tools in that case. Now we know
better. We know that \(\Sigma x_i\) has
a distribution with bell shape. We can use better tools.

Let us do one more thing before deploying all tools. We saw that
\(\Sigma x_i\) grows with \(N,\) and that is not practical. It will be
better to work with the *sample average* \(\bar x=\Sigma x_i/N.\) This is still a
random variable, and still following a Normal distribution. The expected
value of \(\bar x\) is When \(\alpha\) is a
fixed value, the expected value of \(\alpha
x\) is the \(\alpha\) times the
expected value of \(x.\) \[\mathbb E(\bar x)= \mathbb E(\Sigma x_i/N)=
\frac{1}{N}\mathbb E(\Sigma x_i)= \frac{1}{N}N\mu=\mu,\] which is
nice, because now we can connect our experiment with the real value.

It gets better. The variance of \(\bar x\) is also easy to calculate: When \(\alpha\) is a fixed value, the variance of \(\alpha x\) is the \(\alpha^2\) times the variance of \(x.\) \[\mathbb V(\bar x)= \mathbb V(\Sigma x_i/N)= \frac{1}{N^2}\mathbb E(\Sigma x_i)= \frac{1}{N^2}N\sigma^2=\frac{\sigma^2}{N}.\]

This means that, when the sample size \(N\) is bigger, then the variance of the
average \(\mathbb V(\bar x)\) is
smaller. In other words, the average \(\bar
x\) is closer and closerWe can say this
because it is guaranteed by the *Law of Large Numbers*. to
the real value \(\mu.\)

- around 68% of the times, \(\bar x\) is in the interval with endpoints \(μ±σ/\sqrt{N}\)
- around 95% of the times, \(\bar x\) is in the interval with endpoints \(μ±2σ/\sqrt{N}\)
- around 99% of the times, \(\bar x\) is in the interval with endpoints \(μ±3σ/\sqrt{N}\)

This is what people calls “the 68-95-99 rule”. It means that averages
are good. If we want to know the exact probabilities, we need to look in
the table
of Normal distribution, or using the computer. In R we use the
functions `pnorm()`

and `qnorm()`

.

The average \(\bar x\) of the sample is a good predictor of the real value \(μ\). The uncertainty \(σ/\sqrt{N}\) is reduced when the sample size \(N\) increases.

But there is a small problem. We do not know \(\sigma,\) so we do not know the uncertainty. It was all a fantasy. Let’s get real.

## Estimating the uncertainty

What can we do? We can have a reasonable approximation to the variance of the errors \(\sigma^2\) by calculating \[s^2_{N-1}=\frac{1}{N-1}\sum (x_i-\bar x)^2\] This value is still random, because it depends on the \(x_i\) values, but it should be close to \(\sigma^2,\) according to the rules of probabilities. We call this the “estimated population variance”Some people call this “sample variance”, but I don’t like that name for technical reasons..

Likewise, we can estimate the standard deviation of the errors as \[s_{N-1}=\sqrt{s^2_{N-1}}.\] We call this the “estimated standard error of population”. Now we can estimate the uncertainty of the average \(\bar x\) as \[\mathrm{stderr}(\bar x)=\frac{s_{N-1}}{\sqrt{N}}=\sqrt{\frac{s^2_{N-1}}{N}}.\]

This value \(\mathrm{stderr}(\bar
x)\) is called *standard error of the average* and is what
we use as the value of type A uncertainty.

So we have everything to connect our experimental values \(x_i\) to the real value \(\mu.\) We know that

- The sample average \(\bar x\) is close to \(\mu\)
- The distance between \(\bar x\) and \(\mu\) depends on the standard error \(\sqrt{s_{N-1}/N}\)
- The value \(\bar x\) is probably in the interval with endpoints \(μ±k\sqrt{N}.\) That is,\[μ-ks_{N-1}\sqrt{N}<\bar x < μ+ks_{N-1}\sqrt{N}.\]
- The last expression can be rewritten as \[\bar x-ks_{N-1}\sqrt{N}< μ< \bar x+ks_{N-1}\sqrt{N}\] so we have an interval that contains the real value \(\mu.\)

This is called a *confidence interval*. The confidence level
is the probability that the real value \(\mu\) is in this interval. It depends on
the number \(k\) of standard
deviations. If we put more standard deviations, we get more
confidence.

But we have a new problemAgain! Are we just
changing a problem for another?. The price to pay for replacing
the real standard deviation \(\sigma\)
by the estimation \(s_{N-1}\) is that
we cannot use the Normal table. This time the average follows a
Student’s t-distribution, so we have to use the corresponding tables.
The correct table depend on the number of *degrees of freedom*.
In this case, the degrees of freedom are \(N-1.\)

We have freedom to change any value \(x_i,\) except one, since we are bound by the restriction \(\Sigma x_i=N\bar x.\) Once \(\bar x\) is fixed, we loose one degree of freedom.

The intervals are going to be wider. For example, when \(N\) is 3, we can find that the interval
with ends \(\bar x±ks_{N-1}\sqrt{N}\)
has a confidence level ofWe calculate these
confidence levels in R using the function `pt()`

57.7%
(when \(k=1\)), 81.6% (when \(k=2\)), or 90.5% (when \(k=3\)). To get a confidence level of 99% in
this case we need \(k=9.9.\)

The good newsFinally good news! is that, when \(N\) is near 30 or more, the Student’s t-distribution and the Normal distribution are very similar. In practical terms, if we get 30 samples, the average follows a Normal distribution.

# Application

What is this good for?

We want to measure a value. A temperature, an altitude, a concentration. And we want to do it with a given precision. With the tools described here we should be able to follow this plan:

- Do a test experiment. Collect some data to have an idea of the data variability. This can be replaced by a good bibliographical search.
- Draw histograms. See if the data seems to be noisy.
- Estimate the standard deviation of the population
- Calculate what will be the size of a confidence interval for different values of N.
- Find the N that gives the chosen precision.

Then we can do in the opposite direction:

- We get N samples
- We calculate the average
- We calculate the standard error of the average
- We present our result as \(x±\Delta x\) with an explicit confidence level and the correct number of significant figures.