Please download the answer file and edit it on Rstudio. Write your student number in the correct place at the beginning of the answer file. Question 0 should be answered immediately, right now. Stop reading now, and answer the question. Only continue reading after you delivered the photo of your handwritten document and signature. If you do not deliver this right now, your exam will not be graded, and you will get 0. If you have any issues or questions, write to me at my official email andres.aravena@istanbul.edu.tr (only for questions, not for answers). When you finish, send the file answers.Rmd to andres.aravena+cmb@istanbul.edu.tr (only for answers, not for questions).

You can also upload the answers.Rmd file to AKSİS. To avoid confusion, change the file name to your student number. The filename should be something like 040567890.Rmd.

0. Ethical Commitment

Copy the following text in a blank paper. Write it with your own calligraphy. Sign it, take a picture of it and send it immediately to andres.aravena+cmb@istanbul.edu.tr. Write your student number in the subject line of the email. Do not upload this to AKSİS.

CMB2 2021 Makeup Exam “Şerefim üzerine söz veririm ki, bu sınav sırasında etik kurallari çiǧnemedim”

Full Name:
Student Number:
Signature:
Date:

I understand that all answers are strictly personal, and unethical behavior will be penalized. I will work alone and deliver my personal answers. If I fail to do so, I understand that my grade will be 0.

1. Binomial distribution of probabilities

In our course we learned that tossing several coins (all with the same probability ) and counting how many heads we get, produces a random variable. It is easy to simulate this using sample() but it is much more efficient to simulate it using the function rbinom() that is already part of R.

The function rbinom(n, size, prob) takes three inputs. The idea is to throw size independent coins, each one with probability prob, and count how many heads we got. The value n is used to choose how many random numbers we want. It is equivalent to “the number of simulations to carry on”. Please be sure to understand the difference between n and size.

This is the most important case for most experiments in Molecular Biology and Genetics. Each time you grow several plants, some of them will die following this probability distribution. If you create a new vaccine, the number of people who will not get sick will follow this probability distribution. Almost all experiments you will do in your life will be governed by this rule. Understanding this rule is essential to understand your experimental results.

1.1 Average value of binomial, depending on size

We want to see how the size parameter affects the average value of a binomial distribution. We start by creating a short vector with different values of size.

sizes <- floor(10^seq(from=1, to=4, by=0.5))
sizes
[1]    10    31   100   316  1000  3162 10000

Please write a function to calculate the mean value of a random binomial sample, depending on the probability prob and the size parameter. The number of simulations should be at least ten thousands. The function must be called calc_average and take two inputs: the probability of a head prob, and a vector called sizes with several values. The result should be a new vector of the same length as sizes, with the average of ten thousand simulations (or more) of a binomial random variable.

calc_average <- function(prob, sizes) {
  # write here
}

If all goes right, you will see something like this

avg <- calc_average(prob = 0.5, sizes)
plot(avg ~ sizes)

1.2 Linear model for average depending on size

Please fit a linear model for the relationship between avg and sizes and show the resulting coefficients.

(Intercept)       sizes 
  0.0547476   0.5000173 
# write here

1.3 Average value of binomial, depending on prob.

Now we will test the effect of changing the probability of “heads”. For this we will first build a vector called probs with all the values we will study. The code to make this vector is here. Do not copy it, just use it.

probs <- seq(from=0.05, to=0.95, by=0.05)
probs
 [1] 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95

Please write the code to calculate the average for each probability. For this you must take each value of probs and use it with the function calc_average as in queston 1.1 to get a vector called avg. Then you must fit a linear model (as in question 1.2) and find its coefficients. We only care about the second coefficient.

At the end you should assing the resulting values in a vector called all_avg. There should be one value in all_avg for each value in probs.

# write here

Now you should be able to see this figure.

plot(all_avg ~ probs)

1.4 Linear model for all_avg depending on probs

We got a new vector and we made a plot. Now you should fit a linear model for the relationship between all_avg and probs, and print its coefficients.

# write here

The result should be something similar to this

 (Intercept)        probs 
1.177067e-05 1.000032e+00 

2. Variance

2.1 Variance of binomial, depending on size

Now we want to see how the size parameter affects the variance of a binomial distribution.

Please write a function to calculate the variance of a random binomial sample, depending on the probability p and the size s. The number of simulations should be at least ten thousands. The function must be called calc_variance and take two inputs: the probability of a head p, and a vector called s with several values. The result should be a new vector of the same length as s, with the variance of (at least) ten thousand simulations of a binomial random variable.

calc_variance <- function(p, s) {
  # write here
}

If all goes right, you will see something like this

variance <- calc_variance(p=0.5, s=sizes)
plot(log(variance) ~ log(sizes))

2.2 Linear model for Variance, depending on size

We got a new vector and we made a plot. Now you should fit a model for the relationship between variance and size. In this case we will fit a log-log linear model. The relationship we want to represent is \[V = A⋅ s^{B}\] which we can transform into a linear model taking logarithms \[\log(V) = \log A + B\log (s)\] Notice that the coefficients \(A\) and \(B\) will depend on the probability p, as we will see in the next question.

Please write the code to build a log-log model, and store its coefficients in a vector called coef_var_vs_size.

# write here

Now you can show the coefficients with the following code

c(exp(coef_var_vs_size[1]),coef_var_vs_size[2])
(Intercept)  log(sizes) 
  0.2504594   1.0000333 

If you want, you can test what happens with different values of p. For example, these are the results when p=0.3. Your result may be a little different, but not too much. (These tests are completely optional, just for you to see the big picture). By the way, if you read this part, please send me an email with your student number and the message “I’ve read the makeup questions carefully”.

(Intercept)  log(sizes) 
  0.2106769   0.9994806 

These are the results for p=0.2

(Intercept)  log(sizes) 
  0.1609045   0.9993564 

These are the results for p=0.1

(Intercept)  log(sizes) 
  0.0897144   1.0011946 

These are the results for p=0.6

(Intercept)  log(sizes) 
  0.2406164   0.9993262 

These are the results for p=0.7

(Intercept)  log(sizes) 
  0.2091931   1.0005067 

These are the results for p=0.9

(Intercept)  log(sizes) 
 0.09003722  0.99991050 

2.3 Variance of binomial, depending on prob

In the previous question we saw that the linear model coefficients change when the probability changes. We want to calculate the coefficients of the linear model for each value in the vector probs.

Please write the code that takes each value of probs, uses the function calc_variance to get a vector variance as in question 2.1, fits a log-log linear model and gets its coefficients. We only care about the first coefficient. Store the results in a vector called var_vs_prob.

# write here

Now you can plot var_vs_prob versus probs as in the following figure. You can see that the values correspond very closely to the formula \(p(1-p)\).

plot(var_vs_prob ~ probs)
lines(probs*(1-probs) ~ probs, col="red")

3. In plain English

3.1 Average value of binomial

Given all the results of the previous questions, what is the formula for the average value of a binomial random variable, depending on size and prob?

# write here

3.2 Variance of binomial

Given all the results of the previous questions, what is the formula for the variance of a binomial random variable, depending on size and prob?

# write here