Please download the answer file and edit it on Rstudio. Write your student number in the correct place at the beginning of the answer file. Question 0 should be answered immediately, right now. Stop reading now, and answer the question. Only continue reading after you delivered the photo of your handwritten document and signature. If you do not deliver this right now, your exam will not be graded, and you will get 0. If you have any issues or questions, write to me at my official email andres.aravena@istanbul.edu.tr
(only for questions, not for answers). When you finish, send the file answers.Rmd
to andres.aravena+cmb@istanbul.edu.tr
(only for answers, not for questions).
You can also upload the answers.Rmd
file to AKSİS. To avoid confusion, change the file name to your student number. The filename should be something like 040567890.Rmd
.
Copy the following text in a blank paper. Write it with your own calligraphy. Sign it, take a picture of it and send it immediately to andres.aravena+cmb@istanbul.edu.tr
. Write your student number in the subject line of the email. Do not upload this to AKSİS.
CMB2 2021 Makeup Exam “Şerefim üzerine söz veririm ki, bu sınav sırasında etik kurallari çiǧnemedim”
Full Name:
Student Number:
Signature:
Date:
I understand that all answers are strictly personal, and unethical behavior will be penalized. I will work alone and deliver my personal answers. If I fail to do so, I understand that my grade will be 0.
In our course we learned that tossing several coins (all with the same probability ) and counting how many heads we get, produces a random variable. It is easy to simulate this using sample()
but it is much more efficient to simulate it using the function rbinom()
that is already part of R.
The function rbinom(n, size, prob)
takes three inputs. The idea is to throw size
independent coins, each one with probability prob
, and count how many heads we got. The value n
is used to choose how many random numbers we want. It is equivalent to “the number of simulations to carry on”. Please be sure to understand the difference between n
and size
.
This is the most important case for most experiments in Molecular Biology and Genetics. Each time you grow several plants, some of them will die following this probability distribution. If you create a new vaccine, the number of people who will not get sick will follow this probability distribution. Almost all experiments you will do in your life will be governed by this rule. Understanding this rule is essential to understand your experimental results.
size
We want to see how the size
parameter affects the average value of a binomial distribution. We start by creating a short vector with different values of size
.
sizes <- floor(10^seq(from=1, to=4, by=0.5))
sizes
[1] 10 31 100 316 1000 3162 10000
Please write a function to calculate the mean value of a random binomial sample, depending on the probability prob
and the size
parameter. The number of simulations should be at least ten thousands. The function must be called calc_average
and take two inputs: the probability of a head prob
, and a vector called sizes
with several values. The result should be a new vector of the same length as sizes
, with the average of ten thousand simulations (or more) of a binomial random variable.
calc_average <- function(prob, sizes) {
# write here
}
If all goes right, you will see something like this
avg <- calc_average(prob = 0.5, sizes)
plot(avg ~ sizes)
size
Please fit a linear model for the relationship between avg
and sizes
and show the resulting coefficients.
(Intercept) sizes
0.0547476 0.5000173
# write here
prob
.Now we will test the effect of changing the probability of “heads”. For this we will first build a vector called probs
with all the values we will study. The code to make this vector is here. Do not copy it, just use it.
probs <- seq(from=0.05, to=0.95, by=0.05)
probs
[1] 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90 0.95
Please write the code to calculate the average for each probability. For this you must take each value of probs
and use it with the function calc_average
as in queston 1.1 to get a vector called avg
. Then you must fit a linear model (as in question 1.2) and find its coefficients. We only care about the second coefficient.
At the end you should assing the resulting values in a vector called all_avg
. There should be one value in all_avg
for each value in probs
.
# write here
Now you should be able to see this figure.
plot(all_avg ~ probs)
all_avg
depending on probs
We got a new vector and we made a plot. Now you should fit a linear model for the relationship between all_avg
and probs
, and print its coefficients.
# write here
The result should be something similar to this
(Intercept) probs
1.177067e-05 1.000032e+00
size
Now we want to see how the size
parameter affects the variance of a binomial distribution.
Please write a function to calculate the variance of a random binomial sample, depending on the probability p
and the size s
. The number of simulations should be at least ten thousands. The function must be called calc_variance
and take two inputs: the probability of a head p
, and a vector called s
with several values. The result should be a new vector of the same length as s
, with the variance of (at least) ten thousand simulations of a binomial random variable.
calc_variance <- function(p, s) {
# write here
}
If all goes right, you will see something like this
variance <- calc_variance(p=0.5, s=sizes)
plot(log(variance) ~ log(sizes))
size
We got a new vector and we made a plot. Now you should fit a model for the relationship between variance
and size
. In this case we will fit a log-log linear model. The relationship we want to represent is \[V = A⋅ s^{B}\] which we can transform into a linear model taking logarithms \[\log(V) = \log A + B\log (s)\] Notice that the coefficients \(A\) and \(B\) will depend on the probability p
, as we will see in the next question.
Please write the code to build a log-log model, and store its coefficients in a vector called coef_var_vs_size
.
# write here
Now you can show the coefficients with the following code
c(exp(coef_var_vs_size[1]),coef_var_vs_size[2])
(Intercept) log(sizes)
0.2504594 1.0000333
If you want, you can test what happens with different values of p
. For example, these are the results when p=0.3
. Your result may be a little different, but not too much. (These tests are completely optional, just for you to see the big picture). By the way, if you read this part, please send me an email with your student number and the message “I’ve read the makeup questions carefully”.
(Intercept) log(sizes)
0.2106769 0.9994806
These are the results for p=0.2
(Intercept) log(sizes)
0.1609045 0.9993564
These are the results for p=0.1
(Intercept) log(sizes)
0.0897144 1.0011946
These are the results for p=0.6
(Intercept) log(sizes)
0.2406164 0.9993262
These are the results for p=0.7
(Intercept) log(sizes)
0.2091931 1.0005067
These are the results for p=0.9
(Intercept) log(sizes)
0.09003722 0.99991050
prob
In the previous question we saw that the linear model coefficients change when the probability changes. We want to calculate the coefficients of the linear model for each value in the vector probs
.
Please write the code that takes each value of probs
, uses the function calc_variance
to get a vector variance
as in question 2.1, fits a log-log linear model and gets its coefficients. We only care about the first coefficient. Store the results in a vector called var_vs_prob
.
# write here
Now you can plot var_vs_prob
versus probs
as in the following figure. You can see that the values correspond very closely to the formula \(p(1-p)\).
plot(var_vs_prob ~ probs)
lines(probs*(1-probs) ~ probs, col="red")
Given all the results of the previous questions, what is the formula for the average value of a binomial random variable, depending on size
and prob
?
# write here
Given all the results of the previous questions, what is the formula for the variance of a binomial random variable, depending on size
and prob
?
# write here