# Methodology of Scientific Research

## Example: diagnosis

As part of the strategy to control COVID-19, many governments carry on random sampling of the population looking for asymptomatic cases.

Imagine that you are randomly chosen for a test of COVID-19. The test result is “positive”, that is, it says that you have the virus. You also know that the test sometimes fails, giving either a false positive or a false negative. Then the question is what is the probability that you have COVID-19 given that the test said “positive”?

## Context

Let’s assume that:

• There are $$10^{5}$$ people tested
• The test has a precision of 99%
• The prevalence of COVID in the population is 0.1%
• The people to test is chosen randomly from the population

Since this context will be the same in all cases, we will not write it explicitly

## Let’s fill this matrix

Test- Test+ Total
COVID- . . .
COVID+ . . .
Total . . .

COVID reality in the rows and test results in the columns

Test- Test+ Total
COVID- . . .
COVID+ . . .
Total . . 1e+05

We will fill this matrix in the following slides

A large population size help us to see small values

## 0.1% of them are COVID positive

Test- Test+ Total
COVID- . . 99900
COVID+ . . 100
Total . . 1e+05

Prevalence is the percentage of the population that has COVID.
In other words, it is the probability of (COVID+) \begin{aligned} ℙ(\text{COVID}_+) & =0.1\% = 0.001\\ ℙ(\text{COVID}_-) & =99.9\%=0.999 \end{aligned}

## 99% are correctly diagnosed

Test- Test+ Total
COVID- . . 99900
COVID+ . 99 100
Total . . 1e+05

Precision is the probability of a correct diagnostic $ℙ(\text{test}_+ \vert \text{COVID}_+)=0.99$ We fill the box corresponding to (test+,COVID+) $ℙ(\text{test}_+, \text{COVID}_+)=ℙ(\text{test}_+ \vert \text{COVID}_+)\cdotℙ(\text{COVID}_+)$

## 99% are correctly diagnosed

Test- Test+ Total
COVID- 98901 . 99900
COVID+ . 99 100
Total . . 1e+05

In this case the precision for negative cases is the same $ℙ(\text{test}_- | \text{COVID}_-)=0.99$ We fill the box corresponding to (test-,COVID-) $ℙ(\text{test}_-, \text{COVID}_-)=ℙ(\text{test}_- | \text{COVID}_-)⋅ℙ(\text{COVID}_-)$

## 1% are misdiagnosed

Test- Test+ Total
COVID- 98901 999 99900
COVID+ 1 99 100
Total . . 1e+05

Misdiagnostic is the negation of good diagnostic $ℙ(\text{test}_- | \text{COVID}_+)=1-ℙ(\text{test}_+ | \text{COVID}_+)=0.01$ we combine them in the same way as before $ℙ(\text{test}_-, \text{COVID}_+)=ℙ(\text{test}_- | \text{COVID}_+)⋅ ℙ(\text{COVID}_+)$

## Total people diagnosed

Test- Test+ Total
COVID- 98901 999 99900
COVID+ 1 99 100
Total 98902 1098 1e+05

We sum and fill the empty boxes

1098 people got positive test, but only 99 of them have COVID $ℙ(\text{COVID}_+ | \text{test}_+)=\frac{99}{1098} = 9.02\%$

# Diagnostics are classifiers

## Confusion Matrix

Yes No Test
True True Positive False Negative All True
False False Positive True Negative All False
Reality Detected Not detected All cases

Other values that can be calculated

• Sensitivity, specificity
• Precision, Recall
• F-index
• Matthews correlation coefficient (MCC)

## Values

“All the truth” $\textrm{Sensitivity}=\frac{\textrm{True Positives}}{\textrm{All True}}$ “Nothing but the truth” $\textrm{Specificity}=\frac{\textrm{True negatives}}{\textrm{All False}}$ $\textrm{Accuracy}=\frac{\textrm{True Positives+True negatives}}{\textrm{All Cases}}$

## Values

$\textrm{Precision}=\frac{\textrm{True Positives}}{\textrm{Detected}}$ $\textrm{Recall}=\frac{\textrm{True Positives}}{\textrm{All True}}$ $\frac{1}{\textrm{F-index}}=\frac{1}{2}\left(\frac{1}{\textrm{Precision}}+\frac{1}{\textrm{Recall}}\right)$