Please download the answer file and edit it on Rstudio. Write your student number in the correct place at the beginning of the answer file. Question 0 should be answered immediately, right now. Stop reading now, and answer the question. Only continue reading after you delivered the photo of your handwritten document and signature. If you do not deliver this right now, your exam will not be graded, and you will get 0. If you have any issues or questions, write to me at my official email andres.aravena@istanbul.edu.tr (only for questions, not for answers). When you finish, send the file answers.Rmd to andres.aravena+cmb@istanbul.edu.tr (only for answers, not for questions).
You can also upload the answers.Rmd file to AKSİS. To avoid confusion, change the file name to your student number. The filename should be something like 040567890.Rmd.
Copy the following text in a blank paper. Write it with your own calligraphy. Sign it, take a picture of it and send it immediately to andres.aravena+cmb@istanbul.edu.tr. Write your student number in the subject line of the email. Do not upload this to AKSİS.
CMB2 2021 Makeup Exam “Şerefim üzerine söz veririm ki, bu sınav sırasında etik kurallari çiǧnemedim”
Full Name:
Student Number:
Signature:
Date:
I understand that all answers are strictly personal, and unethical behavior will be penalized. I will work alone and deliver my personal answers. If I fail to do so, I understand that my grade will be 0.
In our course we learned that tossing several coins (all with the same probability ) and counting how many heads we get, produces a random variable. It is easy to simulate this using sample() but it is much more efficient to simulate it using the function rbinom() that is already part of R.
The function rbinom(n, size, prob) takes three inputs. The idea is to throw size independent coins, each one with probability prob, and count how many heads we got. The value n is used to choose how many random numbers we want. It is equivalent to “the number of simulations to carry on”. Please be sure to understand the difference between n and size.
This is the most important case for most experiments in Molecular Biology and Genetics. Each time you grow several plants, some of them will die following this probability distribution. If you create a new vaccine, the number of people who will not get sick will follow this probability distribution. Almost all experiments you will do in your life will be governed by this rule. Understanding this rule is essential to understand your experimental results.
sizeWe want to see how the size parameter affects the average value of a binomial distribution. We start by creating a short vector with different values of size.
sizes <- floor(10^seq(from=1, to=4, by=0.5))
sizes
[1] 10 31 100 316 1000 3162 10000
Please write a function to calculate the mean value of a random binomial sample, depending on the probability prob and the size parameter. The number of simulations should be at least ten thousands. The function must be called calc_average and take two inputs: the probability of a head prob, and a vector called sizes with several values. The result should be a new vector of the same length as sizes, with the average of ten thousand simulations (or more) of a binomial random variable.
calc_average <- function(prob, sizes) {
# write here
}
If all goes right, you will see something like this
avg <- calc_average(prob = 0.5, sizes)
plot(avg ~ sizes)
sizePlease fit a linear model for the relationship between avg and sizes and show the resulting coefficients.
(Intercept) sizes
0.0547476 0.5000173
# write here
prob.Now we will test the effect of changing the probability of “heads”. For this we will first build a vector called probs with all the values we will study. The code to make this vector is here. Do not copy it, just use it.
probs <- seq(from=0.05, to=0.95, by=0.05)
probs
[1] 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95
Please write the code to calculate the average for each probability. For this you must take each value of probs and use it with the function calc_average as in queston 1.1 to get a vector called avg. Then you must fit a linear model (as in question 1.2) and find its coefficients. We only care about the second coefficient.
At the end you should assing the resulting values in a vector called all_avg. There should be one value in all_avg for each value in probs.
# write here
Now you should be able to see this figure.
plot(all_avg ~ probs)
all_avg depending on probsWe got a new vector and we made a plot. Now you should fit a linear model for the relationship between all_avg and probs, and print its coefficients.
# write here
The result should be something similar to this
(Intercept) probs
1.177067e-05 1.000032e+00
sizeNow we want to see how the size parameter affects the variance of a binomial distribution.
Please write a function to calculate the variance of a random binomial sample, depending on the probability p and the size s. The number of simulations should be at least ten thousands. The function must be called calc_variance and take two inputs: the probability of a head p, and a vector called s with several values. The result should be a new vector of the same length as s, with the variance of (at least) ten thousand simulations of a binomial random variable.
calc_variance <- function(p, s) {
# write here
}
If all goes right, you will see something like this
variance <- calc_variance(p=0.5, s=sizes)
plot(log(variance) ~ log(sizes))
sizeWe got a new vector and we made a plot. Now you should fit a model for the relationship between variance and size. In this case we will fit a log-log linear model. The relationship we want to represent is \[V = A⋅ s^{B}\] which we can transform into a linear model taking logarithms \[\log(V) = \log A + B\log (s)\] Notice that the coefficients \(A\) and \(B\) will depend on the probability p, as we will see in the next question.
Please write the code to build a log-log model, and store its coefficients in a vector called coef_var_vs_size.
# write here
Now you can show the coefficients with the following code
c(exp(coef_var_vs_size[1]),coef_var_vs_size[2])
(Intercept) log(sizes)
0.2504594 1.0000333
If you want, you can test what happens with different values of p. For example, these are the results when p=0.3. Your result may be a little different, but not too much. (These tests are completely optional, just for you to see the big picture). By the way, if you read this part, please send me an email with your student number and the message “I’ve read the makeup questions carefully”.
(Intercept) log(sizes)
0.2106769 0.9994806
These are the results for p=0.2
(Intercept) log(sizes)
0.1609045 0.9993564
These are the results for p=0.1
(Intercept) log(sizes)
0.0897144 1.0011946
These are the results for p=0.6
(Intercept) log(sizes)
0.2406164 0.9993262
These are the results for p=0.7
(Intercept) log(sizes)
0.2091931 1.0005067
These are the results for p=0.9
(Intercept) log(sizes)
0.09003722 0.99991050
probIn the previous question we saw that the linear model coefficients change when the probability changes. We want to calculate the coefficients of the linear model for each value in the vector probs.
Please write the code that takes each value of probs, uses the function calc_variance to get a vector variance as in question 2.1, fits a log-log linear model and gets its coefficients. We only care about the first coefficient. Store the results in a vector called var_vs_prob.
# write here
Now you can plot var_vs_prob versus probs as in the following figure. You can see that the values correspond very closely to the formula \(p(1-p)\).
plot(var_vs_prob ~ probs)
lines(probs*(1-probs) ~ probs, col="red")
Given all the results of the previous questions, what is the formula for the average value of a binomial random variable, depending on size and prob?
# write here
Given all the results of the previous questions, what is the formula for the variance of a binomial random variable, depending on size and prob?
# write here