- we want to predict the future
- discuss what can happen in the future
- talk about what could have happened in the past
- reason about how we got to this present

Nature has rules. Universal and permanent rules

Whatever happens in the future is the result of applying the rules to the current state of the universe

\[\textrm{State}_{t+1} = F(\textrm{State}_t, \textrm{Parameters})\]

We just need to follow the logic consequences

If we launch a ball, and we know the angle and speed, then we can predict where it will fall

We can launch a rocket and land in the moon

We can put a satellite to explore the Earth, find our position using GPS, and watch TV from other countries

We can build a plane that can fly and carry us to other countries

If the world is deterministic, and we know

- all the rules
- all the parameters with infinite precision
- the current state of the world with infinite precision

then we can predict everything that will happen

and everything that has happened before

We just need to use *logic*

If we have *perfect knowledge*, we can use *logic*

*Logic* deals with things that are either *TRUE* or *FALSE*

For example:

- All men are mortal
- Socrates is a man

Therefore

- Socrates is mortal

We do not know all the rules

Among the rules that we know, some

have complex solutions. They are hard to calculate

depend on parameters that we do not know

give very different results when parameters change a little bit

Since we have *imperfect knowledge*, we must deal with *degrees of certainty*

We want to give a numeric value to the chances that our experiment is successful

We want to compare the chances of *success* versus *failure*

An **experiment** produces a single **outcome**

We do not know the outcome until we perform the experiment

If we knew the outcome before doing the experiment, we would not be doing it

An **event** is a yes-no question that will be answered by the experiment

Having fever is an event. Thermometer showing 38.2 °C is an outcome.

- You have fever, or not
- You finish your Master’s degree, or not
- Rain falls, or not
- Plant grows
- You win the lottery
- Your experiment gives the expected outcome

We need to count positive cases over total cases

There are two paradigms

- We do many experiments and count how many times we got a success, and how many times we got a failure
- We reason from first principles, and use all available knowledge, to find the proportion of success

The first approach is called “frequentist”, and the second is “Bayesian”

Most people are familiar with the naive idea

\[\textrm{Probability}=\frac{\textrm{Number of Successes}}{\textrm{Number of Cases}}\]

This is a useful first approach, but it is easy to get confused

For example, if you throw a dice, what is the probability of getting a 6?

We have to be careful.

new information may change our confidence

For example, if we learn that the dice outcome is an even number, what is the probability of getting a 6?

What if we learn that the outcome is an odd number?

They

- reflect what we know
- represent our rational confidence on future events

They are *subjective*, because different subjects may have different knowledge

But they are *not arbitrary*. We must use all the available information, and follow all the rules

We will use capital letters to represent *events*. For example

\(A\): The dice outcome is 6

\(B\): The dice outcome is even

The probability of \(A\), given that we know \(B\) is \[ℙ(A|B)\]

This is called **conditional probability**

We always evaluate probabilities based on what whe know

If the background knowledge is well known, and does not change, we sometimes write \[ℙ(A)\]

This is to simplify notation. But do not forget that there is an implicit context.

What is the probability that \(A\) and \(B\) happen at the same time \[ℙ(A,B)?\] We can think that we get \(A\) and then we get \(B\) \[ℙ(A,B)=ℙ(A)⋅ℙ(B|A)\] We can think that we get \(B\) and then we get \(A\) \[ℙ(A,B)=ℙ(B)⋅ℙ(A|B)\] Both are correct

This is a very important concept. If \[ℙ(A|B)=ℙ(A)\] then the knowledge of \(B\) does not tell anything about \(A\)

We say that \(A\) and \(B\) are independent

**Only in this case** we have \[ℙ(A,B)=ℙ(A)⋅ℙ(B)\]

Most statistical tests **require** independent events

but that is hard to guarantee in real life

**Be careful!** People die if you do it wrong

The probability of “\(A\) does not happen” is \[ℙ(\textrm{not }A)=1-ℙ(A)\]

With this rule, and De Morgan’s rule, we can build all the theory

- not (\(A\) and \(B\)) = (not \(A\)) or (not \(B\))
- not (\(A\) or \(B\)) = (not \(A\)) and (not \(B\))

What is the probability of \(A\) or \(B\)? \[ℙ(A\textrm{ or }B)=?\]