Class 5: Random Variables

Methodology of Scientific Research

Andrés Aravena, PhD

14 April 2021

Probability Distribution

Last class we showed that the sum of all outcomes’ probabilities is 1 \[ℙ(\{a_1\}) + ℙ(\{a_2\}) + … + ℙ(\{a_n\})=1\] If we know these values, we can calculate everything.

The set of values for all \(i\) \[p(a_i) = ℙ(\{a_i\})= ℙ(\textrm{outcome is exactly }a_i)\] is called the distribution of the probability.

It has two parts

This definition makes sense only if we agree on what are all the possible outcomes.

In other words, we must agree on what is \(Ω\)

Then the probability distribution is a function \[p: Ω → [0,1]\]

Notice that there may be more than one way to define \(Ω\)


The easiest case to study is mixing a set of cards

We shuffle cards several times until we cannot longer know which card will come first

We are interested in the event “the next card will be green”

Probabilities as proportions

Let’s assume that we know how many cards of each color are in the deck

There are \(n_c\) cards of color \(c\in\){“red”,“green”,“blue”, “yellow”}

There are \(N=∑ n_c\) cards in total

If we do not have any solid reason to expect any particular order of cards, then each individual card has the same probability \(1/N\)

The probability of “first color \(c\)” is \[ℙ(\textrm{color is }c)=\frac{n_c}{N}\]

Probability of the next card

We will continue drawing cards, so the proportions will change

Let’s say we got color \(c_1\) in the first draw

Now we have \(N-1\) cards in total, and there are \(n_{c_1}-1\) cards of color \(c_1\)

The probability of “second color \(c\)” is

\[ℙ(\textrm{second color is }c|\textrm{first color is }c_1)=\begin{cases} \frac{n_c}{N-1} &\textrm{if }c≠c_1\\ \frac{n_c-1}{N-1} &\textrm{if }c=c_1 \end{cases}\]

It gets complicated

Making life easier

The formula applies when our measurement changes the experiment

There are two exceptions where proportions do not change

  1. If every \(n_c\) is large, then \(n_c-1≈ n_c\) and \(N-1 ≈ N\)
    • This is the case when we interview a few people from a large population
  2. If we put the card back into the deck and shuffle it again
    • This is called sampling with replacement

In practice, we are often sample from a very large population and we model it as a sampling with replacement

Independent, Identically Distributed

If we replace the card into the deck after we see it, we will have

\[ℙ(\textrm{second color is }c|\textrm{first color is }c_1)=ℙ(\textrm{color is }c)=\frac{n_c}{N}\]

Notice that this means that the second result is independent of the first result, and so on

Moreover, the distribution is identical in each case

This is a very important case, and we give it a name

Independent, Identically Distributed (i.i.d.)

Simple case: a coin

  • Has 2 sides: \(\Omega=\{\text{'Head'}, \text{'Tail'}\}\)
  • Distribution given by \(ℙ(\text{'Head'})\) and \(ℙ(\text{'Tail'})\) such that \[ℙ(\text{'Head'}) + ℙ(\text{'Tail'})= 1\]

All can be reduced to the value \[p=ℙ(\text{'Head'})\]

We say that the probability distribution of the coin depends on the parameter \(p\)

(In math this is called a Bernoulli distribution)

Several coins

What is the probability that we get \(k\) heads if we throw \(N\) coins?

This happens to be one of the most useful cases for us

Let’s assume that all coins are i.i.d. with \(ℙ(\text{'Head'})=p\)

To simplify, we will call \(ℙ(\text{'Tail'})=q\) so \(p+q=1\)

To understand this case, we should start with small values of \(N\)

Two coins

  • There is only one way to get 0 heads: TT
    • this happens with probability \(q^2\)
  • There are two ways of getting 1 head: HT and TH
    • this happens with probability \(pq\)
  • There is only one way to get 2 heads: HH
    • this happens with probability \(p^2\)

we get \[1⋅ q^2,\quad 2⋅ p q,\quad 1⋅ p^2\]

Three coins

  • There is only 1 way to get 0 heads: TTT
    • this happens with probability \(q^3\)
  • There are 3 ways of getting 1 head: HTT, THT, and TTH
    • this happens with probability \(pq^2\)
  • There are 3 ways of getting 2 head: THH, HTH, and HHT
    • this happens with probability \(p^2q\)
  • There is only one way to get 3 heads: HHH
    • this happens with probability \(p^3\)

we get \[1⋅ q^3,\quad 3⋅ pq^2,\quad 3⋅ p^2q,\quad 1⋅ p^3\]

This is like \((a+b)^2\)

The rule of combinations are the same as in the binomial theorem

We get \[ℙ(k\textrm{ Heads in }N\textrm{ coins})= \binom{N}{k} p^k q^{(N-k)}\]

The numbers \(\binom{N}{k}\) are found in Pascal’s triangle

Binomial formula

One way to remember it is to use the formula \[(p+q)^N =\sum_{k=0}^N \binom{N}{k} p^k q^{(N-k)}\]

This is why we call it Binomial distribution

When Outcomes are Numbers

Random variables

The most important applications of probabilities are when the outcomes are numbers

More in general, we care about numbers that depend on the experiment outcome

  • dice: \(⚀↦1, ⚁↦2,…,⚅↦6\)
  • coins: Heads \(↦1\), Tails \(↦0\)
  • temperature
  • number of cells
  • anything we measure

We can do math with numbers

If the outcomes are numbers, we can use them in formulas

For example, if coins are “Heads \(↦1\) and Tails \(↦0\)”, then \[ℙ(k\textrm{ Heads in }N\textrm{ coins})\] is the same as \[ℙ\left(\sum_{i=0}^N X_i=k | X_i \textrm{ are iid coins}\right)\]


In everyday life, if \(𝐱 = (x_1,…,x_N)\) we have \[\text{mean}(𝐱)=\bar{\mathbf x} = \frac{1}{n}\sum_i x_i\]

Using proportions

Now, if we count how many of each different value are there \[n(x) = \textrm{number of times that }(x_i=x)\] Then we can write \[\text{mean}(𝐱)=\bar{\mathbf x} =\sum_x x \frac{n(x)}{N}\]

In other words, to calculate the average we need to know the proportions

Expected value - Mean value

For any random variable \(X\) we define the expected value (also called mean value) of \(X\) as its average over the population \[𝔼X=\sum x\, ℙ(X=x)\] Notice that \(X\) is a random variable but \(𝔼X\) is not.

Generalizing, we can get the expected value of any function of \(X\) \[𝔼\,f(X)=\sum f(x)\, ℙ(X=x)\]

Expected value is linear

If \(X\) and \(Y\) are two random variables, and \(\alpha\) is a real number, then

\[𝔼(X + Y)=𝔼X + 𝔼Y\] \[𝔼(α X)=α\, 𝔼X\]

So, if \(α\) and \(β\) are real numbers, then

\[𝔼(α X +\beta Y)=α\, 𝔼X +β\, 𝔼Y\]

Exercise: prove it yourself

Variance of the population

The variance of the population is defined with the same idea as the sample variance \[𝕍 X=𝔼(X-𝔼X)^2\] Notice that the variance has squared units

In most cases it is more comfortable to work with the standard deviation \(\sigma=\sqrt{𝕍X}.\)

In that case the population variance can be written as \(\sigma^2\)

Simple formula for population variance

We can rewrite the variance of the population with a simpler formula: \[𝕍X=𝔼(X-𝔼X)^2=𝔼(X^2)-(𝔼X)^2\] because \[𝔼(X-𝔼X)^2=𝔼(X^2-2X𝔼X+(𝔼X)^2)\\=𝔼(X^2)-2𝔼(X𝔼X)+𝔼(𝔼X)^2\] but \(𝔼X\) is a non-random number, so \(𝔼(X𝔼X)=(𝔼X)^2\) and \(𝔼(𝔼X)^2=(𝔼X)^2\)

Variance is almost linear

if \(X\) and \(Y\) are two independent random variables, and \(\alpha\) is a real number, then

  • \(𝕍(X + Y)=𝕍 X + 𝕍 Y\)
  • \(𝕍(α X)=α^2 𝕍 X\)

To prove the first equation we use that \(𝔼(XY)=𝔼X\,𝔼Y,\) which is true when \(X\) is independent of \(Y\)