Blog of Andrés Aravena
Methodology of Scientific Research:

Estimating Statistical Uncertainty

25 March 2020

There are two kids of measurement uncertainties

Today we will talk only about Type A uncertainties. We assume that our instrument has a very high resolution and gives us several significant figures about the measured attribute. For the sake of the argument, we can imagine that all measurements have infinite precision.

The problem is that every time we measure, we get a different number. There are variations, in the object being measured and in the measuring device, that induce variabilities in the values. How can we extract meaningful information from noisy data? How much information can we extract?

Theoretical Framework

We assume that there is a real fixed value \(\mu\) that we want to measure, but every measurement gives us \(\mu+\varepsilon_i.\) Here \(i\) is the sample id, that is, an index to identify each sample.

The values \(\varepsilon_i\) are small ideally, but maybe not. More importantly, they are random. In other words, we do not know their exact values before the measurement. This is our second assumption.

Notice that if we know the values of \(\varepsilon_i\) before the measuring, then we can be prepared and correct them. It will be a Type B uncertainty.

Let us give them a name. The \(\varepsilon_i\) are called statistical errors. Here “error” does not mean “mistake”. We did nothing wrong. The meaning of statistical error is literally “the difference between the real value and the measured value”.

If the errors are random, the best we can do is to describe their probability distribution.If you need a refresher of probabilities, we will have it soon. The third assumption we make is that the errors somehow compensate each other. They are sometimes positive, sometimes negative. Specifically, we assume that their expected value is zero \[\mathbb E(\varepsilon_i)=0\qquad\text{for all }i.\]

Also, we assume that they have finite variance. What do we mean with that? Well, since the errors follow a probability distribution, we can in theory calculate the variance of the distribution as \[\mathbb V(\varepsilon_i)=\mathbb E(\varepsilon_i - \mathbb E(\varepsilon_i))^2=\sigma^2\qquad\text{for all }i.\] In some cases, the value \(\sigma^2\) is infinite. We will assume that it is not the case.

The last assumption that we make is that all errors are independent. In other words, we cannot predict better what will be the value of \(\varepsilon_i\) even if we know all the values \(\varepsilon_1\) to \(\varepsilon_{i-1}.\) We do not learn from the past errors.

What does this mean?

First, we assume that all \(\varepsilon_i\) are created equal. The errors do not change with time. They do not get better or worse. All the errors are “produced” by a “machine” with fixed characteristics. This “machine” is usually called noise. We can have a good idea of how the “error generating machine” works by looking at its products. That is, looking at the \(\varepsilon_i\) values.

These assumptions are valid for most of the measurements we do, so they are not a crazy idea. Some of theses assumptions can be relaxed and the results do not change, but they are harder to prove.

In some cases (which are more frequent in Biology than in Physics) these assumptions are not true. So it is a good idea to see if we can verify if these assumptions hold in our experiments.

Measuring the errors

Here is the dilemma. To understand what happens we need to know the values \(\mu\) and \(\sigma.\) But we do not know them. How can we solve this?

We will use the classical strategy. First, we will assume that we know \(\mu\) and \(\sigma,\) and we will see what can we expect from our experiment. Then, in the second part, we will go in the opposite direction, and find what can we know about \(\mu\) and \(\sigma,\) given the results of our experiments.

Doing the experiment

We measure \(N\) times, and we get the values \(x_i\) for \(i\) in \(1,\ldots N.\) What can we expect?

  • Each \(x_i\) has the value \(\mu+\varepsilon_i.\) They are random, because the \(\varepsilon_i\) are random.
  • The experimental results \(x_i\) should not be “far away” from the real value \(\mu.\)
  • Without any further hypothesis, we can say that We can say this because it is guaranteed by the Tchebyshev inequality. Notice that this is different from the 68-95-99 rule, which we will explain below.
    • at least 3/4 of the results lie in the interval with endpoints \(μ±2σ\)
    • at least 8/9 of the results lie in the interval with endpoints μ±3σ;
    • at least \(1−1/k2\) of the results lie in the interval with Here \(\sigma=\sqrt{\sigma^2}=\sqrt{\mathbb V(\varepsilon_i)}\) endpoints \(μ±kσ\) for populations, where \(k\) is any positive whole number that is greater than 1.

It is not so much. Let us see if we can do better by combining all the results.

Combining the results

It is convenient to sum all results. We get a new value, which we can write as \(\Sigma x_i.\) This sum takes all values of \(i\) between \(1\) and \(N\), but we should not care very much about the sum limits for now.

The value \(\Sigma x_i\) is also random, because is depends on the values \(x_i\) that are random. Therefore we can ask what is its expected value and variance. This is easy, because The expected value of the sum is the sum of the expected values. \[\mathbb E(\Sigma x_i)=\sum \mathbb E(x_i)=\sum\mathbb E(\mu +\varepsilon_i)=\sum\mu +\sum\mathbb E(\varepsilon_i)=N\mu.\] We get \(N\mu\) because we added \(N\) values, and all the noise cancelled. Since \(\mu\) is not random, it is its own expected value.

Calculating the variance of \(\Sigma x_i\) is not hard, if we use the assumption that all errors are independent. Thus The variance of the sum is the sum of the variances if the variables are independent. \[\mathbb V(\Sigma x_i)=\sum \mathbb V(x_i)=\sum\mathbb V(\mu +\varepsilon_i)=\sum 0 +\sum\mathbb V(\varepsilon_i)=N\sigma^2.\] This time we get the variance from the errors, and nothing from \(\mu.\) Since \(\mu\) is not random, its variance is 0.

What have we gained? Well, there is another famous theorem that help us. The Central Limit Theorems says that \(\Sigma x_i\) follows a Normal distribution. And this is crucial. Before we had no idea about the distribution of each \(x_i.\) It could have any shape, and we have very few tools in that case. Now we know better. We know that \(\Sigma x_i\) has a distribution with bell shape. We can use better tools.

Let us do one more thing before deploying all tools. We saw that \(\Sigma x_i\) grows with \(N,\) and that is not practical. It will be better to work with the sample average \(\bar x=\Sigma x_i/N.\) This is still a random variable, and still following a Normal distribution. The expected value of \(\bar x\) is When \(\alpha\) is a fixed value, the expected value of \(\alpha x\) is the \(\alpha\) times the expected value of \(x.\) \[\mathbb E(\bar x)= \mathbb E(\Sigma x_i/N)= \frac{1}{N}\mathbb E(\Sigma x_i)= \frac{1}{N}N\mu=\mu,\] which is nice, because now we can connect our experiment with the real value.

It gets better. The variance of \(\bar x\) is also easy to calculate: When \(\alpha\) is a fixed value, the variance of \(\alpha x\) is the \(\alpha^2\) times the variance of \(x.\) \[\mathbb V(\bar x)= \mathbb V(\Sigma x_i/N)= \frac{1}{N^2}\mathbb E(\Sigma x_i)= \frac{1}{N^2}N\sigma^2=\frac{\sigma^2}{N}.\]

This means that, when the sample size \(N\) is bigger, then the variance of the average \(\mathbb V(\bar x)\) is smaller. In other words, the average \(\bar x\) is closer and closerWe can say this because it is guaranteed by the Law of Large Numbers. to the real value \(\mu.\)

  • around 68% of the times, \(\bar x\) is in the interval with endpoints \(μ±σ/\sqrt{N}\)
  • around 95% of the times, \(\bar x\) is in the interval with endpoints \(μ±2σ/\sqrt{N}\)
  • around 99% of the times, \(\bar x\) is in the interval with endpoints \(μ±3σ/\sqrt{N}\)

This is what people calls “the 68-95-99 rule”. It means that averages are good. If we want to know the exact probabilities, we need to look in the table of Normal distribution, or using the computer. In R we use the functions pnorm() and qnorm().

The average \(\bar x\) of the sample is a good predictor of the real value \(μ\). The uncertainty \(σ/\sqrt{N}\) is reduced when the sample size \(N\) increases.

But there is a small problem. We do not know \(\sigma,\) so we do not know the uncertainty. It was all a fantasy. Let’s get real.

Estimating the uncertainty

What can we do? We can have a reasonable approximation to the variance of the errors \(\sigma^2\) by calculating \[s^2_{N-1}=\frac{1}{N-1}\sum (x_i-\bar x)^2\] This value is still random, because it depends on the \(x_i\) values, but it should be close to \(\sigma^2,\) according to the rules of probabilities. We call this the “estimated population variance”Some people call this “sample variance”, but I don’t like that name for technical reasons..

Likewise, we can estimate the standard deviation of the errors as \[s_{N-1}=\sqrt{s^2_{N-1}}.\] We call this the “estimated standard error of population”. Now we can estimate the uncertainty of the average \(\bar x\) as \[\mathrm{stderr}(\bar x)=\frac{s_{N-1}}{\sqrt{N}}=\sqrt{\frac{s^2_{N-1}}{N}}.\]

This value \(\mathrm{stderr}(\bar x)\) is called standard error of the average and is what we use as the value of type A uncertainty.

So we have everything to connect our experimental values \(x_i\) to the real value \(\mu.\) We know that

  • The sample average \(\bar x\) is close to \(\mu\)
  • The distance between \(\bar x\) and \(\mu\) depends on the standard error \(\sqrt{s_{N-1}/N}\)
  • The value \(\bar x\) is probably in the interval with endpoints \(μ±k\sqrt{N}.\) That is,\[μ-ks_{N-1}\sqrt{N}<\bar x < μ+ks_{N-1}\sqrt{N}.\]
  • The last expression can be rewritten as \[\bar x-ks_{N-1}\sqrt{N}< μ< \bar x+ks_{N-1}\sqrt{N}\] so we have an interval that contains the real value \(\mu.\)

This is called a confidence interval. The confidence level is the probability that the real value \(\mu\) is in this interval. It depends on the number \(k\) of standard deviations. If we put more standard deviations, we get more confidence.

But we have a new problemAgain! Are we just changing a problem for another?. The price to pay for replacing the real standard deviation \(\sigma\) by the estimation \(s_{N-1}\) is that we cannot use the Normal table. This time the average follows a Student’s t-distribution, so we have to use the corresponding tables. The correct table depend on the number of degrees of freedom. In this case, the degrees of freedom are \(N-1.\)

We have freedom to change any value \(x_i,\) except one, since we are bound by the restriction \(\Sigma x_i=N\bar x.\) Once \(\bar x\) is fixed, we loose one degree of freedom.

The intervals are going to be wider. For example, when \(N\) is 3, we can find that the interval with ends \(\bar x±ks_{N-1}\sqrt{N}\) has a confidence level ofWe calculate these confidence levels in R using the function pt() 57.7% (when \(k=1\)), 81.6% (when \(k=2\)), or 90.5% (when \(k=3\)). To get a confidence level of 99% in this case we need \(k=9.9.\)

The good newsFinally good news! is that, when \(N\) is near 30 or more, the Student’s t-distribution and the Normal distribution are very similar. In practical terms, if we get 30 samples, the average follows a Normal distribution.

Application

What is this good for?

We want to measure a value. A temperature, an altitude, a concentration. And we want to do it with a given precision. With the tools described here we should be able to follow this plan:

  1. Do a test experiment. Collect some data to have an idea of the data variability. This can be replaced by a good bibliographical search.
  2. Draw histograms. See if the data seems to be noisy.
  3. Estimate the standard deviation of the population
  4. Calculate what will be the size of a confidence interval for different values of N.
  5. Find the N that gives the chosen precision.

Then we can do in the opposite direction:

  1. We get N samples
  2. We calculate the average
  3. We calculate the standard error of the average
  4. We present our result as \(x±\Delta x\) with an explicit confidence level and the correct number of significant figures.