# Methodology of Scientific Research

## An event is a set of outcomes The set of all possible outcomes is often called Ω

An event 𝐴 can be seen as the set of all outcomes that make the event true

For example, $Fever=\{Temp>37.5°C\}$

## Evaluating rational beliefs

An event will become either true or false after an experiment

For example, a dice can be either 4 or not

We want to give a value to our rational belief that the event will become true after the experiment

The numeric value is called Probability

## Probabilities as Areas

It is useful to think that the probability of an event is the area in the drawing

The total area of Ω is 1

Usually we do not know the shape of 𝐴

## Probabilities depend on our knowledge

Our rational beliefs depend on our knowledge

If we represent our knowledge (or hypothesis) by 𝑍, the the probability of an event 𝐴 is written as $ℙ(A|Z)$ We read “the probability of event 𝐴, given that we know 𝑍”

For example, “the probability that we get a 4, given that the dice is symmetrical”

## Important idea

The order is relevant $ℙ(A|Z)≠ℙ(Z|A)$ There are two events, 𝐴 and 𝑍

The one written after | is what we assume to be true

The one written before | is what we are asking for

One we know, the other we do not

## Visually Now outcomes are limited only to the 𝑍 region

We measure the area of $$ℙ(A|Z)$$ with respect to the area of 𝑍 instead of Ω

The shape of 𝑍 is often unknown

## Degrees of belief

If, given our knowledge 𝑍, the event 𝐵 is more plausible than the event 𝐴, then $ℙ(A|Z)≤ℙ(B|Z)$

For example, the probability that we get either 4, 5 or 6 is greater than the probability that we get a 4, given that the dice is symmetrical $ℙ(\{4\}|Z)≤ℙ(\{4,5,6\}|Z)$

## Degrees of belief

On the other hand, if we get new information, the probabilities may change

The same event 𝐴 may be more plausible under a new hypothesis 𝑌 than under the initial hypothesis 𝑍

Then $ℙ(A|Z)≤ℙ(A|Y)$

## Probability rules based on these two ideas

It has been proven that probabilities must be like this

1. A probability is a number between 0 and 1 inclusive $ℙ(A) ≥ 0\textrm{ and }ℙ(A)≤1$

2. The probability of an sure event is 1 $ℙ(\textrm{True}) = 1$

3. The probability of an impossible event is 0 $ℙ(\textrm{False}) = 0$

## Complex events

We are interested in non-trivial events, that are usually combinations of smaller events

For example, we may ask “what is the probability that, in a group of 𝑛 people, at least two persons have the same birthday”

Fortunately, any complex event can be decomposed into simpler events, combined with and, or and not connectors

Exercise: decompose the birthday event into simpler ones

## Probability of not 𝐴

If the event 𝐴 becomes more and more plausible, then the opposite event not 𝐴 becomes less and less plausible

It can be shown that we always have $ℙ(\textrm{not }A) = 1-ℙ(A)$

## Joint Probability

The probability of of 𝐴 and 𝐵 happening simultaneously must be connected to the probability of each one

It can be shown that there are only two ways to calculate it

• Start with the prob. of $$A$$ and then of $$B$$ given that $$A$$ is true $ℙ(A,B)=ℙ(A)⋅ℙ(B|A)$
• Start with the prob. of $$B$$ and then of $$A$$ given that $$B$$ is true $ℙ(A,B)=ℙ(B)⋅ℙ(A|B)$

## It must be a multiplication

It can be proven that the only way to combine $$ℙ(A)$$ and $$ℙ(B|A)$$ to get $$ℙ(A,B)$$ is to multiply them.

Both are true, since $$ℙ(A,B)=ℙ(B,A).$$ The order that we write them is irrelevant.

## Probability of 𝐴 or 𝐵

We know how to calculate $$ℙ(A\textrm{ and }B)$$ and $$ℙ(\textrm{not }A)$$

We also know the De Morgan’s law, to swap ANDs with ORs
$\textrm{not }(A \textrm{ or }B) = (\textrm{not }A) \textrm{ and }(\textrm{not }B)$

Therefore we can write

\begin{aligned} ℙ(A \textrm{ or }B) & = 1 - ℙ(\textrm{not }(A \textrm{ or }B))\\ & = 1-ℙ( (\textrm{not }A) \textrm{ and }(\textrm{not }B)) \end{aligned}

## Using the multiplication rule

$ℙ(A \textrm{ or }B) = 1-ℙ( (\textrm{not }A) \textrm{ and }(\textrm{not }B)) \\ = 1-ℙ(\textrm{not }A)⋅P(\textrm{not }B|\textrm{not }A)$

using negation rule \begin{aligned} ℙ(A \textrm{ or }B) & = 1-ℙ(\textrm{not }A)⋅(1- ℙ(B|\textrm{not }A)) \\ & = 1-ℙ(\textrm{not }A) + ℙ(\textrm{not }A)⋅P(B|\textrm{not }A) \end{aligned}

## Using the multiplication rule again

\begin{aligned} ℙ(A \textrm{ or }B) & = 1 -ℙ(\textrm{not }A) + ℙ(\textrm{not }A,B) \\ ℙ(A \textrm{ or }B) & = 1 -(1-ℙ(A)) + ℙ(\textrm{not }A|B)ℙ(B) \\ ℙ(A \textrm{ or }B) & = ℙ(A) + (1-ℙ(A|B))ℙ(B) \\ ℙ(A \textrm{ or }B) & = ℙ(A) + ℙ(B)-ℙ(A|B)ℙ(B) \\ ℙ(A \textrm{ or }B) & = ℙ(A) + ℙ(B)-ℙ(A,B) \end{aligned} You need to remember only the last line

The previous lines justify why the last one is always true

## Do not count twice If A and B can happen at the same time, then $$ℙ(A) + ℙ(B)$$ counts the intersection twice

So we have to take out the intersection $$ℙ(A,B)$$ $ℙ(A \textrm{ or }B) = \\ ℙ(A) + ℙ(B)-ℙ(A,B)$

## It gets complicated

If there are three compatible events, things get messy

\begin{aligned} & ℙ(A \textrm{ or }B \textrm{ or }C) \\ & ℙ(A) + ℙ(B \textrm{ or }C)-ℙ(A,(B \textrm{ or }C)) \\ & ℙ(A) + ℙ(B) + ℙ(C)-ℙ(B,C) - ℙ(A,B \textrm{ or }A,C) \\ & ℙ(A) + ℙ(B) + ℙ(C)-ℙ(B,C) - (ℙ(A,B) + ℙ(A,C) - ℙ(A,B,C)) \\ & ℙ(A) + ℙ(B) + ℙ(C)-ℙ(B,C) - ℙ(A,B) - ℙ(A,C) + ℙ(A,B,C) \end{aligned}

It gets worse with more events

## There is a better way

Using De Morgan’s rule

\begin{aligned} & ℙ(A \textrm{ or }B \textrm{ or }C) \\ & 1 - ℙ((\textrm{not }A) \textrm{ and }(\textrm{not }B) \textrm{ and }(\textrm{not }C))\\ & 1 - ℙ(\textrm{not }A)⋅ℙ(\textrm{not }B | \textrm{not }A)⋅ℙ(\textrm{not }C | \textrm{not }A, \textrm{not }B)\\ & 1 - (1-ℙ(A))⋅(1-ℙ(B | \textrm{not }A))⋅(1-ℙ(C | \textrm{not }A, \textrm{not }B)) \end{aligned}

This is often easier to calculate

## Example: Multiple Birthdays

Let’s say we have three people, with birthday $$x_1, x_2$$ and $$x_3.$$

The probability that there are at least two people with the same birthday is $ℙ(x_2=x_1 \textrm{ or }x_3=x_2 \textrm{ or }x_3=x_1)$ which can be rewritten as $1-ℙ(x_2≠x_1 \textrm{ and }x_3≠x_2 \textrm{ and }x_3≠x_1)$

## Using the multiplication rule

We want to calculate $1-ℙ(x_2≠x_1 \textrm{ and }x_3≠x_2 \textrm{ and }x_3≠x_1)$ We can separate like this (only the first and) $1-ℙ(x_2≠x_1)⋅ℙ(x_3≠x_2 \textrm{ and }x_3≠x_1|x_2≠x_1)$ Assuming 365 possible birthdays, we have $1-\frac{364}{365}⋅\frac{363}{365}$

## Exercise

• What is the probability that, in a group of N people, at least two of them share the same birthday?

• How many people do we need to have at least 50% probability of least two of them sharing the same birthday?

# Special case

## If A and B are incompatible

if A and B cannot happen at the same time, then $$(A \textrm{ and }B)$$ is impossible, therefore $$ℙ(A,B)=0$$

In that case (and only in that case) $ℙ(A \textrm{ or }B) = ℙ(A) + ℙ(B)$

## Splitting a set into pieces

In particular we have $ℙ(A) = ℙ(A\textrm{ and }(B \textrm{ or }\textrm{not }B)) = ℙ(A,B) + ℙ(A, \textrm{not }B)$ because

• $$(A \textrm{ and }B)$$ is incompatible with $$(A \textrm{ and }\textrm{not }B)$$,
• $$(A \textrm{ and }(B \textrm{ or }\textrm{not }B))$$ is equal to $$A$$

## Splitting $$Ω$$

If we partition Ω into 𝑛 subsets $$A_i$$, such that they cover all Ω $\Omega=A_1 ∪ A_2 ∪ … ∪ A_n$ and each pair of events are mutually incompatible $A_i ∩ A_j=\phi$ then we have $ℙ(\Omega)=ℙ(A_1) + ℙ(A_2) + … + ℙ(A_n)=1$

## All outcomes

One kind of events are the set of each single outcome

If $$a_i ∈ Ω$$ is an outcome, then $$A_i=\{a_i\}$$ is an event

“The experiment outcome is exactly $$a_i$$

It is easy to see that these events are mutually incompatible and cover all Ω

Thus, $ℙ(\{a_1\}) + ℙ(\{a_2\}) + … + ℙ(\{a_n\})=1$