Today, according to “Our World in Data”
Why?
Only 29.7% of people in low-income countries have been vaccinated
In rich countries we observe vaccine hesitance
Political
Belief
Distrust of Science
Besides the Anti-vaccine people, we have
Climate change denial
Flat Earthers
and several others
In 2009, 2% of scientists admitted to falsifying studies at least once
14% admitted to personally knowing someone who did
The problem is not dishonesty
Most people do not lie
A 2016 poll of 1,500 scientists reported that 70% of them had failed to reproduce at least one other scientist’s experiment
50% had failed to reproduce one of their own experiments
There are problems with the experiments and their analysis
Repeated the top 100 studies in psychology
Journal of the Royal Statistical Society and American Statistical Association
“the ritualistic miming of statistics rather than conscientious practice”
“practitioners go through the motions of fitting models, computing p-values or confidence intervals, or simulating posterior distributions”
“They invoke statistical terms and procedures as incantations, with scant understanding of the assumptions or relevance of the calculations, or even the meaning of the terminology”
“We believe that poor statistical education and practice are symptoms of and contributors to problems in science as a whole”
As someone who teaches maths and statistics, I say
We are bad at teaching maths
We need to find better ways to teach math
(in particular to math teachers)
Math allow us to travel in time, and see the invisible
That is why people who knows do not want to teach math:
It gives power to the people
“They” do not want you to know math
Imagine that you are randomly chosen for a test of COVID-19
The result is “positive”. It says that you have the virus
But this test fails 2% of times, giving a false positive or a false negative
Then the question:
What is the probability that you have COVID-19 given that the test said “positive”?
The test is correct 98% of times
That is, the probability of a positive test given that you have COVID
But we really want to know the probability of COVID given that the test is positive
They are not the same
To answer this question we must know the prevalence of COVID
That is, what is the proportion of the population with COVID
There were 728,692 active cases in Turkey last year
(only 1,128 were serious)
Population of Turkey is 85,828,516
Dividing both numbers we find that the prevalence is 0.86%
Let’s assume that:
Test- | Test+ | Total | |
---|---|---|---|
COVID- | . | . | . |
COVID+ | . | . | . |
Total | . | . | . |
COVID reality in the rows and test results in the columns
Test- | Test+ | Total | |
---|---|---|---|
COVID- | . | . | . |
COVID+ | . | . | . |
Total | . | . | 1000000 |
We will fill this matrix in the following slides
Assuming one million people makes the math easier
Test- | Test+ | Total | |
---|---|---|---|
COVID- | . | . | 991400 |
COVID+ | . | . | 8600 |
Total | . | . | 1000000 |
Prevalence is the percentage of the population that has COVID.
In other words, it is the probability of (COVID+)
Test- | Test+ | Total | |
---|---|---|---|
COVID- | 971572 | . | 991400 |
COVID+ | . | 8428 | 8600 |
Total | . | . | 1000000 |
Precision is the probability of a correct diagnostic
(Here we assumed that the error rates are the same for positive and negative. That may not be the case always)
Test- | Test+ | Total | |
---|---|---|---|
COVID- | 971572 | 19828 | 991400 |
COVID+ | 172 | 8428 | 8600 |
Total | . | . | 1000000 |
(this error rate is only an example. Real tests are usually better)
Test- | Test+ | Total | |
---|---|---|---|
COVID- | 971572 | 19828 | 991400 |
COVID+ | 172 | 8428 | 8600 |
Total | 971744 | 28256 | 1000000 |
We sum and fill the empty boxes
28256 people got positive test, but only 8428 of them have COVID
Probability of having COVID if the text is positive: 29.83%
Erika Check Hayden, Weak statistical standards implicated in scientific irreproducibility Nature, 11 November 2013
Open Science Collaboration, Estimating the reproducibility of psychological science. Science 28 Aug 2015: Vol. 349, Issue 6251, aac4716
Replications can cause distorted belief in scientific progress Behavioral and Brain Sciences, Volume 41, 2018, e122 DOI: https://doi.org/10.1017/S0140525X18000584. Published online by Cambridge University Press: 27 July 2018
Reproducibility of Scientific Results Stanford Encyclopedia of Philosophy
T.D. Stanley, Evan C. Carter and Hristos Doucouliagos What Meta-Analyses Reveal about the Replicability of Psychological Research Deakin Laboratory for the Meta-Analysis of Research, Working Paper, November 2017
Silas Boye Nissen, Tali Magidson, Kevin Gross Is a corresponding author, Carl T Bergstrom Research: Publication bias and the canonization of false facts eLife Dec 20, 2016; 5:e21451
“An Essay towards solving a Problem in the Doctrine of Chances” is a work on the mathematical theory of probability by the Reverend Thomas Bayes, published in 1763, two years after its author’s death
The use of the Bayes theorem has been extended in science and in other fields
From Wikipedia, the free encyclopedia
Since we have \[ℙ(A, B) = ℙ(A)⋅ ℙ(B|A)\] and, by symmetry, we also have \[ℙ(A, B) = ℙ(B)⋅ℙ(A|B)\] then we conclude that \[ℙ(A)⋅ ℙ(B|A) = ℙ(B)⋅ℙ(A|B)\]
We can reorganize everything as \[ℙ(B|A) = \frac{ℙ(B)⋅ℙ(A|B)}{ℙ(A)}\]
This is known as Bayes’ Rule or Bayes’ Theorem
It can be understood as \[ℙ(B|A) = ℙ(A|B)⋅\frac{ℙ(B)}{ℙ(A)}\] which is a rule to reverse the order of the conditional probability
It shows us how to go from \(ℙ(A|B)\) to \(ℙ(B|A)\)
This is the view we will use now
Another point of view is \[ℙ(B|A) = \frac{ℙ(A|B)}{ℙ(A)}⋅ℙ(B)\] which is a rule to update our opinions
Bayes says how to change \(ℙ(B)\) when we learn \(A\)
“When the facts change, I change my opinion. What do you do, sir?”
John Maynard Keynes (1883 – 1946), English economist, “father” of macroeconomics
We can write \(ℙ(A)\) as \(ℙ(A,B) + ℙ(A,\text{not } B)\)
which can be rewritten as \[ℙ(A) =
ℙ(B)⋅ℙ(A|B) + ℙ(\text{not } B)⋅ℙ(A|\text{not } B)\] therefore we
have a second formula for Bayes’ rule \[ℙ(B|A) = \frac{ℙ(B)⋅ℙ(A|B)}{ℙ(B)⋅ℙ(A|B) +
ℙ(\text{not } B)⋅ℙ(A|\text{not } B)}\]
When we filled the “diagnostic” matrix, we did this
not A | A | Total | |
---|---|---|---|
not B | \(ℙ(\text{not }A,\text{not }B)\) | \(ℙ(A,\text{not } B)\) | \(ℙ(\text{not } B)\) |
B | \(ℙ(\text{not } A, B)\) | \(ℙ(A|B)\) | \(ℙ(B)\) |
Total | \(ℙ(\text{not } A)\) | \(ℙ(A)\) | \(1\) |
To get \(ℙ(A, B)\) we used \(ℙ(B)⋅ℙ(A|B)\)
Here \(A\) is “test positive” and \(B\) is “having COVID”
We want to evaluate the probability of having COVID, given that the test is positive \[ℙ(\text{COVID}_+ | \text{test}_+)\] given that we know the prevalence \(ℙ(\text{COVID}_+)=0.0086\) and the precision \[ \begin{aligned} ℙ(\text{test}_+ | \text{COVID}_+)=0.98\\ ℙ(\text{test}_- | \text{COVID}_-)=0.98 \end{aligned} \]
\[ℙ(\text{COVID}_+ | \text{test}_+)=\frac{ℙ(\text{test}_+ | \text{COVID}_+)⋅ℙ(\text{COVID}_+)}{ℙ(\text{test}_+)}\]
We need to know \(ℙ(\text{test}_+),\) which we can get as
\[ℙ(\text{test}_+)= ℙ(\text{test}_+, \text{COVID}_+)+ ℙ(\text{test}_+, \text{COVID}_-)\]
\[\begin{aligned} ℙ(\text{test}_+, \text{COVID}_+)& =ℙ(\text{test}_+ \vert \text{COVID}_+)\cdotℙ(\text{COVID}_+)\\ & =0.98⋅ 0.0086 = 0.008428 \end{aligned} \] and \[ \begin{aligned} ℙ(\text{test}_+, \text{COVID}_-)& =ℙ(\text{test}_+ \vert \text{COVID}_-)\cdotℙ(\text{COVID}_-) \\ & =(1-0.98)⋅ (1-0.0086) \\ & =0.02⋅0.9914 = 0.019828 \end{aligned} \]
\[ \begin{aligned} ℙ(\text{COVID}_+ | \text{test}_+) & = \frac{ℙ(\text{test}_+ | \text{COVID}_+)⋅ℙ(\text{COVID}_+)}{ℙ(\text{test}_+)}\\ & = \frac{0.98⋅0.0086}{0.98⋅0.0086 + 0.02⋅0.9914} \\ & = \frac{0.008428}{0.008428 + 0.019828} \\ & = 0.2983 \end{aligned} \]
Prob of having COVID given positive test is near 30%
(assuming there are no other symptoms, or other information)
We would like to have no errors, but errors are usually unavoidable. The technology used will limit the precision of diagnostics.
In practice, diagnostic test designers know that telling that you are healthy when you are sick is much worse than telling that you are sick when you are healthy. In other words, the cost of a false positive is usually lower than the cost of a false negative.
Of course the costs depend on the disease and in the technique.
\[\begin{aligned} ℙ(\text{test}_+ \vert \text{COVID}_+)=a\\ ℙ(\text{test}_- \vert \text{COVID}_-)=b \end{aligned} \]