Today, according to “Our World in Data”

Why?

Only 29.7% of people in low-income countries have been vaccinated

In rich countries we observe

*vaccine hesitance*

Political

Belief

Distrust of Science

Besides the Anti-vaccine people, we have

Climate change denial

Flat Earthers

and several others

In 2009, 2% of scientists admitted to falsifying studies at least once

14% admitted to personally knowing someone who did

The problem is not dishonesty

**Most people do not lie**

A 2016 poll of 1,500 scientists reported that 70% of them had failed to reproduce at least one other scientist’s experiment

50% had failed to reproduce one of their own experiments

**There are problems with the experiments and their
analysis**

Repeated the top 100 studies in psychology

- Only 36% of the replications gave significant findings
- compared to 97% of the original studies

- The effect size in the replications was half of the effect size in the original studies, on average

Journal of the Royal Statistical Society and American Statistical Association

- Physicist
- Excellent professor
- Worked in the Manhattan Project at 25 years old
- Nobel Prize on Physics in 1965
- He was talking about USA in the 1970s

“the ritualistic miming of statistics rather than conscientious practice”

“practitioners go through the motions of fitting models, computing p-values or confidence intervals, or simulating posterior distributions”

“They invoke statistical terms and procedures as incantations, with scant understanding of the assumptions or relevance of the calculations, or even the meaning of the terminology”

“We believe that poor statistical education and practice are symptoms of and contributors to problems in science as a whole”

As someone who teaches maths and statistics, I say

**We are bad at teaching
maths**

We need to find better ways to teach math

(in particular to math teachers)

Math allow us to travel in time, and see the invisible

That is why people who knows do not want to teach math:

It gives power to the people

“They” do not want you to know math

Imagine that you are randomly chosen for a test of COVID-19

The result is “positive”. It says that you have the virus

But this test fails 2% of times, giving a *false positive* or
a *false negative*

Then the question:

**What is the probability that you have COVID-19 given that the
test said “positive”?**

The test is correct 98% of times

That is, the probability of a positive test *given that you have
COVID*

But we really want to know the probability of COVID *given that
the test is positive*

They are not the same

To answer this question we must know the *prevalence* of
COVID

That is, what is the proportion of the population with COVID

There were 728,692 active cases in Turkey last year

(only 1,128 were serious)

Population of Turkey is 85,828,516

Dividing both numbers we find that the prevalence is 0.86%

Let’s assume that:

- There are one million people

(this will make it easy to calculate) - The test has a
*precision*of 98% - The
*prevalence*of COVID in the population is 0.86% - The people to test is chosen randomly from the population

Test- | Test+ | Total | |
---|---|---|---|

COVID- |
. | . | . |

COVID+ |
. | . | . |

Total |
. | . | . |

COVID reality in the rows and test results in the columns

Test- | Test+ | Total | |
---|---|---|---|

COVID- |
. | . | . |

COVID+ |
. | . | . |

Total |
. | . | 1000000 |

We will fill this matrix in the following slides

Assuming one million people makes the math easier

Test- | Test+ | Total | |
---|---|---|---|

COVID- |
. | . | 991400 |

COVID+ |
. | . | 8600 |

Total |
. | . | 1000000 |

Prevalence is the percentage of the population that has COVID.

In other words, it is the probability of (COVID_{+})

Test- | Test+ | Total | |
---|---|---|---|

COVID- |
971572 | . | 991400 |

COVID+ |
. | 8428 | 8600 |

Total |
. | . | 1000000 |

*Precision* is the probability of a correct diagnostic

(Here we assumed that the error rates are the same for positive and negative. That may not be the case always)

Test- | Test+ | Total | |
---|---|---|---|

COVID- |
971572 | 19828 | 991400 |

COVID+ |
172 | 8428 | 8600 |

Total |
. | . | 1000000 |

(this error rate is only an example. Real tests are usually better)

Test- | Test+ | Total | |
---|---|---|---|

COVID- |
971572 | 19828 | 991400 |

COVID+ |
172 | 8428 | 8600 |

Total |
971744 | 28256 | 1000000 |

We sum and fill the empty boxes

28256 people got positive test, but only 8428 of them have COVID

Probability of having COVID if the text is positive: 29.83%

Erika Check Hayden, Weak statistical standards implicated in scientific irreproducibility Nature, 11 November 2013

Open Science Collaboration, Estimating the reproducibility of psychological science. Science 28 Aug 2015: Vol. 349, Issue 6251, aac4716

Replications can cause distorted belief in scientific progress Behavioral and Brain Sciences, Volume 41, 2018, e122 DOI: https://doi.org/10.1017/S0140525X18000584. Published online by Cambridge University Press: 27 July 2018

Reproducibility of Scientific Results Stanford Encyclopedia of Philosophy

T.D. Stanley, Evan C. Carter and Hristos Doucouliagos What Meta-Analyses Reveal about the Replicability of Psychological Research Deakin Laboratory for the Meta-Analysis of Research, Working Paper, November 2017

Silas Boye Nissen, Tali Magidson, Kevin Gross Is a corresponding author, Carl T Bergstrom Research: Publication bias and the canonization of false facts eLife Dec 20, 2016; 5:e21451

*“An Essay towards solving a Problem in the Doctrine of
Chances”* is a work on the mathematical theory of probability by the
Reverend Thomas Bayes, published in 1763, two years after its author’s
death

The use of the Bayes theorem has been extended in science and in other fields

From Wikipedia, the free encyclopedia

Since we have \[ℙ(A, B) = ℙ(A)⋅ ℙ(B|A)\] and, by symmetry, we also have \[ℙ(A, B) = ℙ(B)⋅ℙ(A|B)\] then we conclude that \[ℙ(A)⋅ ℙ(B|A) = ℙ(B)⋅ℙ(A|B)\]

We can reorganize everything as \[ℙ(B|A) = \frac{ℙ(B)⋅ℙ(A|B)}{ℙ(A)}\]

This is known as Bayes’ Rule or Bayes’ Theorem

It can be understood as \[ℙ(B|A) =
ℙ(A|B)⋅\frac{ℙ(B)}{ℙ(A)}\] which is a rule to *reverse the
order* of the conditional probability

It shows us how to go from \(ℙ(A|B)\) to \(ℙ(B|A)\)

This is the view we will use now

Another point of view is \[ℙ(B|A) = \frac{ℙ(A|B)}{ℙ(A)}⋅ℙ(B)\] which is a rule to update our opinions

- \(ℙ(B)\) is the
*a priori*probability - \(ℙ(B|A)\) is
*a posteriori*probability

Bayes says how to change \(ℙ(B)\) when we learn \(A\)

“When the facts change, I change my opinion. What do you do, sir?”

**John Maynard Keynes (1883 – 1946),** **English
economist, “father” of macroeconomics**

We can write \(ℙ(A)\) as \(ℙ(A,B) + ℙ(A,\text{not } B)\)

which can be rewritten as \[ℙ(A) =
ℙ(B)⋅ℙ(A|B) + ℙ(\text{not } B)⋅ℙ(A|\text{not } B)\] therefore we
have a second formula for Bayes’ rule \[ℙ(B|A) = \frac{ℙ(B)⋅ℙ(A|B)}{ℙ(B)⋅ℙ(A|B) +
ℙ(\text{not } B)⋅ℙ(A|\text{not } B)}\]

When we filled the “diagnostic” matrix, we did this

not A | A | Total | |
---|---|---|---|

not B |
\(ℙ(\text{not }A,\text{not }B)\) | \(ℙ(A,\text{not } B)\) | \(ℙ(\text{not } B)\) |

B |
\(ℙ(\text{not } A, B)\) | \(ℙ(A|B)\) | \(ℙ(B)\) |

Total |
\(ℙ(\text{not } A)\) | \(ℙ(A)\) | \(1\) |

To get \(ℙ(A, B)\) we used \(ℙ(B)⋅ℙ(A|B)\)

Here \(A\) is “test positive” and \(B\) is “having COVID”

We want to evaluate the probability of having COVID, given that the
test is positive \[ℙ(\text{COVID}_+ |
\text{test}_+)\] given that we know the *prevalence* \(ℙ(\text{COVID}_+)=0.0086\) and the
*precision* \[
\begin{aligned}
ℙ(\text{test}_+ | \text{COVID}_+)=0.98\\
ℙ(\text{test}_- | \text{COVID}_-)=0.98
\end{aligned}
\]

\[ℙ(\text{COVID}_+ | \text{test}_+)=\frac{ℙ(\text{test}_+ | \text{COVID}_+)⋅ℙ(\text{COVID}_+)}{ℙ(\text{test}_+)}\]

We need to know \(ℙ(\text{test}_+),\) which we can get as

\[ℙ(\text{test}_+)= ℙ(\text{test}_+, \text{COVID}_+)+ ℙ(\text{test}_+, \text{COVID}_-)\]

\[\begin{aligned} ℙ(\text{test}_+, \text{COVID}_+)& =ℙ(\text{test}_+ \vert \text{COVID}_+)\cdotℙ(\text{COVID}_+)\\ & =0.98⋅ 0.0086 = 0.008428 \end{aligned} \] and \[ \begin{aligned} ℙ(\text{test}_+, \text{COVID}_-)& =ℙ(\text{test}_+ \vert \text{COVID}_-)\cdotℙ(\text{COVID}_-) \\ & =(1-0.98)⋅ (1-0.0086) \\ & =0.02⋅0.9914 = 0.019828 \end{aligned} \]

\[ \begin{aligned} ℙ(\text{COVID}_+ | \text{test}_+) & = \frac{ℙ(\text{test}_+ | \text{COVID}_+)⋅ℙ(\text{COVID}_+)}{ℙ(\text{test}_+)}\\ & = \frac{0.98⋅0.0086}{0.98⋅0.0086 + 0.02⋅0.9914} \\ & = \frac{0.008428}{0.008428 + 0.019828} \\ & = 0.2983 \end{aligned} \]

Prob of having COVID given positive test is near 30%

(assuming there are no other symptoms, or other information)

We would like to have no errors, but errors are usually unavoidable. The technology used will limit the precision of diagnostics.

In practice, diagnostic test designers know that telling that you are healthy when you are sick is much worse than telling that you are sick when you are healthy. In other words, the cost of a false positive is usually lower than the cost of a false negative.

Of course the costs depend on the disease and in the technique.

\[\begin{aligned} ℙ(\text{test}_+ \vert \text{COVID}_+)=a\\ ℙ(\text{test}_- \vert \text{COVID}_-)=b \end{aligned} \]