Blog of Andrés Aravena
Methodology of Scientific Research:

Application: Evaluating Statistical Uncertainty

26 March 2020

To answer the question “does coffee gets cold faster than tea” we need to measure how temperature changes with time. As an initial reference, I made a simple experiment with just boiling water.

A first approach can be seen in the following time-lapse video:

We can watch the video frame by frame and write down temperature and time. An easier approach is to write only when temperature changes. The advantage of this approach is that we only need a cellphone, a thermometer and a timer. In fact, we can even omit the timer if we know the frame rate of the camera. Most cameras are very precise in this timing. I included a timer just to have a reference. I’m not even sure if the timer is exactly one-second per tick.

From this first experiment we can learn a couple of things. We know that a cup of water takes more than ten thousand seconds (1:46 hours) to cool down to room temperature. In the first frames the temperature was 93°C and the last temperature was 27°C. Second, temperature does not change very fast, except at the beginning.

This means that we need to record the temperature for several hours, but we only need a sample every few seconds, or even once every minute. The cooling time will depend also on the volume of water. When I use a small hot-water bottle, my feet get cold during the night, but when I use a big bottle, I’m fine.

Another thing that we notice is that ambient temperature will have an effect on the cooling. So we need a second thermometer to control the air temperature.

The disadvantage of this approach is that it requires boring manual work to extract the data. We can do better. We have robots that work for us.

I prepared a second experiment with a data logger. The device registered water temperature, air temperature, air pressure and altitude, every 10 seconds (nominally). I prepared the experiment in the late evening and let it run all night, near a closed window.

The air pressure is probably unnecessary for the current experiment, but the device was already measuring it, so keeping these values will help us to prepare for when we measure the building height.

All the data is on Google Sheets and in a text file, which is easier to process with R. The analysis can be done in several ways, the easy parts are easier in the spreadsheet, the advanced parts are easier in R. Choose your tools wisely, they will serve you for all your life.

The first thing to do is to familiarize oneself with the data. It is a good idea to add a column with the row number, which we usually call row id or just id. Please notice that the starting value of seconds is arbitrary, it only reflects how long the machine was turned on before the first measurement.

id seconds temp_water temp_air pressure altitude
1 6893 25.0625 26.3804 101279 3.62597
2 6903 25.0000 26.3927 101301 1.91472
3 6913 25.0000 26.3191 101300 2.00618
4 6922 27.6250 26.2700 101308 1.33930
5 6932 32.3125 26.1964 101301 1.91281
6 6942 38.8125 26.1534 101308 1.33930

After looking at the first few lines, it is always good to plot the data and see how does it look. This is easy in Google Sheets or Excel, and also in R. For now let us just focus on temperature.

Figure 1. Temperature measurements.

We see that the air temperature remains more or less constant during all the mesurements. It will be easier to analize.

Figure 2. Air temperature measurements.

Well, it is not so constant. There is a sharp temperature reduction in the first seconds, and then it goes down more or less linearly. My guess is that the initial temperature was the one from the room where I prepared the device, and then it cooled to the experiment room temperature. Then it cools as the night cools, until the sunrise. The window looks to the east, so it gets warmer in the morning. In retrospective, it would have been wise to let the device cool to the room temperature before starting, and record the real time from a real clock.

A sharp eye may also notice that there are times when the temperature rises. My guess is that these are the times when the fridge motor was working. The experiment room was my kitchen. Refrigerators keep their interior at low temperature by transferring heat to the exterior, so the room gets warmer.

Evaluating uncertainty

We will use air temperature as our main variable. We can choose any point on time and take several values, let’s say 𝑁. We want to evaluate, for each position: average, standard deviation, and standard error. And we will do it for several values of 𝑁.

Here we evaluated 𝑁 equal to 3, 10, 20 and 30. You can try other values. The first rows of results look like the tables on the margin.

Result for N = 3
id avg stdev stderr
3 26.36 0.03942 0.02276
4 26.33 0.06176 0.03566
5 26.26 0.06176 0.03566
6 26.21 0.05897 0.03404
7 26.15 0.05247 0.03029
8 26.09 0.06764 0.03905
Result for N = 10
id avg stdev stderr
10 26.18 0.1608 0.05085
11 26.13 0.163 0.05154
12 26.08 0.1512 0.04781
13 26.03 0.1416 0.04479
14 25.99 0.1302 0.04118
15 25.95 0.1203 0.03804
Result for N = 20
id avg stdev stderr
20 25.97 0.246 0.055
21 25.93 0.2432 0.05438
22 25.89 0.2334 0.05218
23 25.85 0.2251 0.05034
24 25.81 0.2183 0.0488
25 25.77 0.2129 0.04761
Result for N = 30
id avg stdev stderr
30 25.79 0.338 0.06171
31 25.75 0.3324 0.06068
32 25.71 0.3225 0.05887
33 25.67 0.3131 0.05717
34 25.64 0.3038 0.05547
35 25.61 0.296 0.05405

It seems that the standard error get worse with bigger N, but it is just a transient phenomena. Looking the full picture we observe these graphics.

Figure 3. Standard error with different sample sizes. All data.

In the first seconds the standard error is large. This is due to the fast change in the value we are measuring. In this case there is a transient period before the temperature stabilizes. Things are be more clear if we focus on the values after the transient.

Figure 4. Standard error with different sample sizes. Omitting 150 samples of transient

We observe that the standard error is random, since we evaluate it from random data, but it follows some patterns. Being pessimistic, we can take the maximum value for each one. And we can look at the Student’s t-distribution table, to know the factor for a 95% confidence. Finally, we calculate the uncertainty on each case.

Final Result
  Max Std. Error k (95%) Interval width Interv. 1 sigfig
3 0.01449 4.303 0.06236 0.06
10 0.006826 2.262 0.01544 0.02
20 0.005027 2.093 0.01052 0.01
30 0.004675 2.045 0.009562 0.01

Therefore, we can have at most one decimal when we average 3 samples, two decimals when we average 10 samples, and three decimals with 20 or 30 samples. We can see this in the following figure.

Figure 5. Rounded average temperatures for different sample sizes.

There seems to be no significant difference between averaging 20 and 30 samples.

Can you replicate these results?

Can you repeat this analysis for pressure?

About Time

According to the original design, there should be one sample every 10 seconds. But if we look a the difference of seconds between consecutive rows, we find that over 34% of times the timer advanced 9 seconds.

Figure 6. Distribution of time differences.

So the measuring device was sampling faster than intended. How fast was it running?

Figure 7. Comparison of expected time versus real time of sampling.

We see that instead of “10 seconds every 10 seconds” we have a little less.

How much time really passes between samples?

I look forward for your comments.