# Methodology of Scientific Research

## An event is a set of outcomes

The set of all possible outcomes is often called Ξ©

An event π΄ can be seen as the set of all outcomes that make the event true

For example, $Fever=\{Temp>37.5Β°C\}$

## Evaluating rational beliefs

An event will become either true or false after an experiment

For example, a dice can be either 4 or not

We want to give a value to our rational belief that the event will become true after the experiment

The numeric value is called Probability

## Probabilities as Areas

It is useful to think that the probability of an event is the area in the drawing

The total area of Ξ© is 1

Usually we do not know the shape of π΄

## Probabilities depend on our knowledge

Our rational beliefs depend on our knowledge

If we represent our knowledge (or hypothesis) by π, the the probability of an event π΄ is written as $β(A|Z)$ We read βthe probability of event π΄, given that we know πβ

For example, βthe probability that we get a 4, given that the dice is symmetricalβ

## Important idea

The order is relevant $β(A|Z)β β(Z|A)$ There are two events, π΄ and π

The one written after | is what we assume to be true

The one written before | is what we are asking for

One we know, the other we do not

## Visually

Now outcomes are limited only to the π region

We measure the area of $$β(A|Z)$$ with respect to the area of π instead of Ξ©

The shape of π is often unknown

## Degrees of belief

If, given our knowledge π, the event π΅ is more plausible than the event π΄, then $β(A|Z)β€β(B|Z)$

For example, the probability that we get either 4, 5 or 6 is greater than the probability that we get a 4, given that the dice is symmetrical $β(\{4\}|Z)β€β(\{4,5,6\}|Z)$

## Degrees of belief

On the other hand, if we get new information, the probabilities may change

The same event π΄ may be more plausible under a new hypothesis π than under the initial hypothesis π

Then $β(A|Z)β€β(A|Y)$

## Probability rules based on these two ideas

It has been proven that probabilities must be like this

1. A probability is a number between 0 and 1 inclusive $β(A) β₯ 0\textrm{ and }β(A)β€1$

2. The probability of an sure event is 1 $β(\textrm{True}) = 1$

3. The probability of an impossible event is 0 $β(\textrm{False}) = 0$

## Complex events

We are interested in non-trivial events, that are usually combinations of smaller events

For example, we may ask βwhat is the probability that, in a group of π people, at least two persons have the same birthdayβ

Fortunately, any complex event can be decomposed into simpler events, combined with and, or and not connectors

Exercise: decompose the birthday event into simpler ones

## Probability of not π΄

If the event π΄ becomes more and more plausible, then the opposite event not π΄ becomes less and less plausible

It can be shown that we always have $β(\textrm{not }A) = 1-β(A)$

## Joint Probability

The probability of of π΄ and π΅ happening simultaneously must be connected to the probability of each one

It can be shown that there are only two ways to calculate it

• Start with the prob. of $$A$$ and then of $$B$$ given that $$A$$ is true $β(A,B)=β(A)β β(B|A)$
• Start with the prob. of $$B$$ and then of $$A$$ given that $$B$$ is true $β(A,B)=β(B)β β(A|B)$

## It must be a multiplication

It can be proven that the only way to combine $$β(A)$$ and $$β(B|A)$$ to get $$β(A,B)$$ is to multiply them.

Both are true, since $$β(A,B)=β(B,A).$$ The order that we write them is irrelevant.

## Probability of π΄ or π΅

We know how to calculate $$β(A\textrm{ and }B)$$ and $$β(\textrm{not }A)$$

We also know the De Morganβs law, to swap ANDs with ORs
$\textrm{not }(A \textrm{ or }B) = (\textrm{not }A) \textrm{ and }(\textrm{not }B)$

Therefore we can write

\begin{aligned} β(A \textrm{ or }B) & = 1 - β(\textrm{not }(A \textrm{ or }B))\\ & = 1-β( (\textrm{not }A) \textrm{ and }(\textrm{not }B)) \end{aligned}

## Using the multiplication rule

$β(A \textrm{ or }B) = 1-β( (\textrm{not }A) \textrm{ and }(\textrm{not }B)) \\ = 1-β(\textrm{not }A)β P(\textrm{not }B|\textrm{not }A)$

using negation rule \begin{aligned} β(A \textrm{ or }B) & = 1-β(\textrm{not }A)β (1- β(B|\textrm{not }A)) \\ & = 1-β(\textrm{not }A) + β(\textrm{not }A)β P(B|\textrm{not }A) \end{aligned}

## Using the multiplication rule again

\begin{aligned} β(A \textrm{ or }B) & = 1 -β(\textrm{not }A) + β(\textrm{not }A,B) \\ β(A \textrm{ or }B) & = 1 -(1-β(A)) + β(\textrm{not }A|B)β(B) \\ β(A \textrm{ or }B) & = β(A) + (1-β(A|B))β(B) \\ β(A \textrm{ or }B) & = β(A) + β(B)-β(A|B)β(B) \\ β(A \textrm{ or }B) & = β(A) + β(B)-β(A,B) \end{aligned} You need to remember only the last line

The previous lines justify why the last one is always true

## Do not count twice

If A and B can happen at the same time, then $$β(A) + β(B)$$ counts the intersection twice

So we have to take out the intersection $$β(A,B)$$ $β(A \textrm{ or }B) = \\ β(A) + β(B)-β(A,B)$

## It gets complicated

If there are three compatible events, things get messy

\begin{aligned} & β(A \textrm{ or }B \textrm{ or }C) \\ & β(A) + β(B \textrm{ or }C)-β(A,(B \textrm{ or }C)) \\ & β(A) + β(B) + β(C)-β(B,C) - β(A,B \textrm{ or }A,C) \\ & β(A) + β(B) + β(C)-β(B,C) - (β(A,B) + β(A,C) - β(A,B,C)) \\ & β(A) + β(B) + β(C)-β(B,C) - β(A,B) - β(A,C) + β(A,B,C) \end{aligned}

It gets worse with more events

## There is a better way

Using De Morganβs rule

\begin{aligned} & β(A \textrm{ or }B \textrm{ or }C) \\ & 1 - β((\textrm{not }A) \textrm{ and }(\textrm{not }B) \textrm{ and }(\textrm{not }C))\\ & 1 - β(\textrm{not }A)β β(\textrm{not }B | \textrm{not }A)β β(\textrm{not }C | \textrm{not }A, \textrm{not }B)\\ & 1 - (1-β(A))β (1-β(B | \textrm{not }A))β (1-β(C | \textrm{not }A, \textrm{not }B)) \end{aligned}

This is often easier to calculate

## Example: Multiple Birthdays

Letβs say we have three people, with birthday $$x_1, x_2$$ and $$x_3.$$

The probability that there are at least two people with the same birthday is $β(x_2=x_1 \textrm{ or }x_3=x_2 \textrm{ or }x_3=x_1)$ which can be rewritten as $1-β(x_2β x_1 \textrm{ and }x_3β x_2 \textrm{ and }x_3β x_1)$

## Using the multiplication rule

We want to calculate $1-β(x_2β x_1 \textrm{ and }x_3β x_2 \textrm{ and }x_3β x_1)$ We can separate like this (only the first and) $1-β(x_2β x_1)β β(x_3β x_2 \textrm{ and }x_3β x_1|x_2β x_1)$ Assuming 365 possible birthdays, we have $1-\frac{364}{365}β \frac{363}{365}$

## Exercise

• What is the probability that, in a group of N people, at least two of them share the same birthday?

• How many people do we need to have at least 50% probability of least two of them sharing the same birthday?

# Special case

## If A and B are incompatible

if A and B cannot happen at the same time, then $$(A \textrm{ and }B)$$ is impossible, therefore $$β(A,B)=0$$

In that case (and only in that case) $β(A \textrm{ or }B) = β(A) + β(B)$

## Splitting a set into pieces

In particular we have $β(A) = β(A\textrm{ and }(B \textrm{ or }\textrm{not }B)) = β(A,B) + β(A, \textrm{not }B)$ because

• $$(A \textrm{ and }B)$$ is incompatible with $$(A \textrm{ and }\textrm{not }B)$$,
• $$(A \textrm{ and }(B \textrm{ or }\textrm{not }B))$$ is equal to $$A$$

## Splitting $$Ξ©$$

If we partition Ξ© into π subsets $$A_i$$, such that they cover all Ξ© $\Omega=A_1 βͺ A_2 βͺ β¦ βͺ A_n$ and each pair of events are mutually incompatible $A_i β© A_j=\phi$ then we have $β(\Omega)=β(A_1) + β(A_2) + β¦ + β(A_n)=1$

## All outcomes

One kind of events are the set of each single outcome

If $$a_i β Ξ©$$ is an outcome, then $$A_i=\{a_i\}$$ is an event

βThe experiment outcome is exactly $$a_i$$β

It is easy to see that these events are mutually incompatible and cover all Ξ©

Thus, $β(\{a_1\}) + β(\{a_2\}) + β¦ + β(\{a_n\})=1$