If you do not have access to the age distribution, but you know only the standard deviation

- How many people you need to interview to estimate the average age of the Turkish population with a margin of error of 5 years?
- β¦ of 1 year?
- β¦ of 1 month?
- What is the probability that the real value is inside the intervals you have found?

We are looking for population mean \(πΌX\)

We know the population variance \(πX\)

We interview \(n\) people and calculate \(\bar{π}\)

The population average is probably in the interval \[\left[\bar{π}-c\sqrt{π(X)/n}, \bar{π}+c\sqrt{π(X)/n}\right]\]

Using Chebyshevβs inequality, we know that the probability is at least \(1-1/c^2\)

We want the interval width to be less than 5 (or 1, or 1/12) years

Letβs say \[yβ€2c\sqrt{π(X)/n}\] therefore \[nβ₯4c^2π(X)/y^2\]

Variance π(X) is 473.23 yr^{2}

c | Prob | 5 yr | 1 yr | 1 month |
---|---|---|---|---|

2 | 75% | 1,515 | 7,572 | 90,860 |

3 | 89% | 3,408 | 17,037 | 204,435 |

5 | 96% | 9,465 | 47,323 | 567,874 |

10 | 99% | 37,859 | 189,292 | 2,271,496 |

Homework Question 1

A company wants to offer insurance to protect against the economic damage of COVID-19.

- If a person takes the insurance, they pay π₯.
- If they get COVID in the next year , then they got paid a fixed amount π¦
- this happens with probability π

- After one year, the company will have a net result π corresponding to income minus expenses.
- Since expenses depend on how many people get sick, π is a random variable.

- What is the expected value of the net result π ?
- What are the variance and standard deviation of the net result π ?
- What is the interval that contains the real net result π with 99% probability?

We have \(n\) people paying, and \(π\) people getting sick. The result is \[R=nx-πy\] For our analysis, \(n, x\) and \(y\) are fixed, but \(π\) is a random variable. Thus \(R\) is a random variable.

We want to know \(πΌ(R)\)

How can we calculate it?

Using the definition, we have \[πΌ(R)=πΌ(nx-sy)=nx-πΌ(π)y\] So we need to calculate \(πΌ(π)\)

What do we know about \(π\)?

There are \(n\) people, each one can get sick with probability \(p\)

Each person is a βcoinβ with probability \(p\)

Thus \(π\) is a *sum of coins*

Assuming that each person gets sick independently, then \[π \sim Binom(n,p)\] Therefore, we immediately know that \[πΌ(π)=np\qquad π(π)=np(1-p)\]

- What is the expected value of the net result π ? \[πΌ(R)=nx-πΌ(π)y=nx-npy\]
- What are the variance and standard deviation of the net result π ? \[π(R)=π(nx)+π(-πy)=0+ π(π)y^2 =np(1-p)y^2\]
- What is the interval that contains the real net result π with 99% probability?

After one year, the result \(R\) will be somewhere \[\left[πΌ(R)-c\sqrt{π(R)}, πΌ(R)+c\sqrt{π(R)}\right]\] That is \[\left[nx-npy-cy\sqrt{np(1-p)}, nx-npy+cy\sqrt{np(1-p)}\right]\]

How do we choose \(x\) and \(y\)?

We want \(Rβ₯0,\) so the lower limit of the interval must be positive \[nx-npy-cy\sqrt{npq}β₯0\] thus \[\frac{x}{y}β₯p+c\sqrt{\frac{p(1-p)}{n}}\]

Assuming \(p=0.1,\) then \(x/y\) must be at least

c | Prob | 10 | 100 | 1000 | 10000 | 100000 |
---|---|---|---|---|---|---|

2 | 75% | 0.29 | 0.16 | 0.12 | 0.11 | 0.10 |

3 | 89% | 0.38 | 0.19 | 0.13 | 0.11 | 0.10 |

5 | 96% | 0.57 | 0.25 | 0.15 | 0.12 | 0.10 |

10 | 99% | 1.05 | 0.40 | 0.19 | 0.13 | 0.11 |

We used Chebyshev formula, which does not need any hypothesis

But we have more information. We know that \(π\) is a Binomial random variable

Therefore we can make better confidence intervals

We know that \[β(π=k|n\text{ in total})=\binom{n}{k} p^k(1-p)^{n-k}\] We can calculate \(\binom{n}{k}\) using Pascalβs triangle, even in Excel

Pascalβs Triangle

\[β(πβ€k)=\sum_{j=0}^k β(π=j)\]

Good tools include functions to calculate the usual distributions

In Excel we have `BINOM.DIST(k, n, p, cumulative)`

In R we have `pbinom()`

and `dbinom()`

Now we have a coin π with two possible outcomes: +1 and -1

To make life easy, we assume π=0.5

What are the expected value and variance of X ?

We throw the coin π times, and we calculate π, the sum of all π \[Y=\sum_{i=1}^π X_i\]

What are the expected value and variance of π ?

- π is basically a Binomial random variable
- πΌπ = 0, because πΌπ = 0
- ππ = π, because ππ = 1

Now consider \(Z_n=Y/\sqrt{π}\)

It is easy to see that \(πΌZ_n = 0\) and \(πZ_n = 1\) independent of π

The possible values of \(Z_n\) are not integers. Not even rationals

What happens with \(Z_n\) when π is *really big*?

When \(nββ,\) the distribution of \(Z_n=β X/\sqrt{π}\) will converge toa **Normal** distribution \[\lim_{nββ} Z_n βΌ Normal(0,1)\]

If \(X_i\) is a set of **independent, identically distributed** random variables, with expected value \[πΌX_i=ΞΌ\quad\text{for all }i\] and variance \[πX_i=Ο^2\quad\text{for all }i\] then, when \(n\) is large \[\lim_{nββ} \frac{\sum_i X_i-ΞΌ}{Ο\sqrt{π}} βΌ Normal(0,1)\]

If \(X_i\) is a set of **independent, identically distributed** random variables, with expected value \[πΌX_i=ΞΌ\quad\text{for all }i\] and variance \[πX_i=Ο^2\quad\text{for all }i\] then, when \(n\) is large \[\lim_{nββ} \frac{\sum_i X_i-ΞΌ}{\sqrt{π}} βΌ Normal(0, Ο^2)\]

- Thermal noise is the sum of many small vibrations in all directions
- they sum and usually cancel each other

- Phenotype depends on several genetic conditions
- Height, weight and similar attributes depend on the combination of several attributes

- Not all combined effects are
*sums*- some effects are multiplicative

- Some effects may not have finite variace
- sometimes variance is infinite

- Not all effects are independent
- this is the most critical issue