Work on this list **every day without exception**, at
least 25 minutes without interruption. Use an alarm clock to know when
to stop. Do not stop until the alarm rings. Always stop when the alarm
rings and do something else for at least 5 minutes.

If you can, repeat this once every day. If you do it twice you have on hour every day, roughly equivalent to one day. That is exactly one more day than most people have studied so far, so doing this will be a huge advantage.

Things to avoid:

Don’t look the answers on the Internet. They are probably not there anyway, and you miss the chance of

*thinking*on your own.Don’t work alone. Find one or more friends and explain to them your ideas. Speaking loud helps to think. If you live alone in the top of a mountain and do not have any friends (I’m very sorry for you, really), you can use the course forum, make your own WhatsApp group of facebook page. Or send handwritten letters like people has been sending for the last 3000 years.

Don’t just read someone’s else answer. Be sure to understand the solution, try to do in a different way and write your own. Always write it, using your hands.

# 1. Computational thinking

## 1.1 Exploring vectors

You will program your own version of some standard functions using
only `for()`

, `if()`

and indices. All the
following functions receive a vector.

Please write your own version of the following functions:

`vector_min(x)`

, equivalent to`min(x)`

. Returns the smallest element in`x`

.`vector_max(x)`

, equivalent to`max(x)`

. Returns the largest element in`x`

.`vector_which_min(x)`

, equivalent to`which_min(x)`

. Returns the index of the smallest element in`x`

.`vector_which_max(x)`

, equivalent to`which_max(x)`

. Returns the index of the largest element in`x`

.`vector_mean(x)`

, equivalent to`mean(x)`

. Returns the average of all elements in`x`

.`vector_cumsum(x)`

, equivalent to`cumsum(x)`

. Returns a vector of the same length of`x`

with the cumulative sum of`x`

`vector_diff(x)`

, equivalent to`diff()`

. Returns a vector one element shorter than`x`

with the difference between consecutive elements of`x`

.`vector_apply(x, f)`

, equivalent to`sapply(x, f)`

. Inputs are vector`x`

and function`f`

. Returns a new vector`y`

of the same length of`x`

where`y[i]`

is`f(x[i])`

for all`i`

.

You can test your function with the following code.

```
<- sample(5:20, size=10, replace=TRUE)
x min(x)
vector_min(x)
```

The two results must be the same. Obviously, you have to replace
`min`

and `vector_min`

with the corresponding
functions.

## 1.2 Merging vectors

Please write a function called `vector_merge(x, y)`

that
receives two **sorted** vectors `x`

and
`y`

and returns a new vector with the elements of
`x`

and `y`

together **sorted**. The
output vector has size `length(x)+length(y)`

.

You *must assume* that each of the input vectors is already
sorted.

For that you have to use three indices: `i`

,
`j`

, and `k`

; to point into `x`

,
`y`

and the output vector `ans`

. On each step you
have to compare `x[i]`

and `y[j]`

. If
`x[i] < y[j]`

then `ans[k] <- x[i]`

,
otherwise `ans[k] <- y[j]`

.

You have to increment `i`

or `j`

, and
`k`

carefully. To test your function, you can use this
code:

```
<- sample(letters)
a <- sort(a[1:13])
x <- sort(a[14:26])
y vector_merge(x, y)
```

The output must be a sorted alphabet.

## 1.3 Sorting

Please write a function called `vector_mergesort(x)`

that
takes a single vector `x`

and returns a new vector with the
same elements of `x`

but sorted from the smallest to the
largest.

To do so you have to use a **recursive** strategy as
follows:

- If the input vector
`x`

has length 1, then it is already sorted. In that case the output is a copy of`x`

- If the length of the input is larger than 1 then you split
`x`

in two parts. The new vector`x1`

contains the first half of`x`

, and`x2`

has the second half. - Be careful when
`length(x)`

is odd. - Now you have to sort
`x1`

and`x2`

by using**the same**function`vector_mergesort()`

. Store the results in`ans1`

and`ans2`

. - Finally you have to
*merge*`ans1`

and`ans2`

using the function`vector_merge()`

of the previous exercise, and return the merged vector.

# 2. Random processes

Please write a function called

`my_sample(x, size, replace, prob)`

, equivalent to the function`sample(x, size, replace, prob)`

, using only`sample.int(n, size, replace, prob)`

Simulate an experiment with

`N`

independent dice. The result of the experiment is the sum of all dice.- Plot the histogram of the result for 100 replicas, for different
values of
`N`

. You can write a function for this. - Plot the average of the results of 100 replicas, depending on
different values of
`N`

such as 10, 1010, 2010, …, 2E4. - What is the relationship between the averages of the results and
`N`

? Build a linear model and explain the result. - Use the
`quartile(x, ...)`

function to find a 95% confidence interval for the result of the experiment.

- Plot the histogram of the result for 100 replicas, for different
values of
Simulate an experiment with

`N`

independent coins. Each side of the coins are labeled`+1`

and`-1`

. The result of the experiment is the sum of all coin labels.- Plot the histogram of the result for 100 replicas, for different
values of
`N`

. - Plot the average of the result depending on different values of
`N`

, like 10, 1010, 2010, …, 2E4. What is the relationship? - Write a function called
`squared_vector(N)`

taking`N`

as input, simulating 400 replicas, and returning a vector with the*square*of each replica. For example, if the replicas are`c(1,-2,0,...,-1,3)`

, the function must return`c(1,4,0,...,1,9)`

.^{1} - Plot the mean of the output of
`squared_vector(N)`

versus N for different values of`N`

, like 10, 1010, 2010, …, 2E4. - What is the relationship between the mean of the squares of the
results and
`N`

? Build a linear model and explain the result.

- Plot the histogram of the result for 100 replicas, for different
values of
How many times you have to throw a dice to get a 6? Give the average and a 95% confidence interval.

^{2}How many times you have to throw a dice to get two consecutive 6? Give the average and a 95% confidence interval.

How many times you have to throw a dice to get two 6, consecutive or not? Give the average and a 95% confidence interval.

How many times you have to throw two dice to get a sum equal to 6? Give the average and a 95% confidence interval.

We have six lamps labeled 1 to 6. Initially they are all turned off. You trow a dice and get a number

`x`

. Then you switch the lamp that has the label`x`

.How many times you have to trow the dice until all six lamps are turned on? Give a range that is valid at least 95% of times.

What is the effect of the read length in the number of contigs? Assume shotgun assembly of a genome of size 1E6, and make a plot for different read lengths and number of reads.

# 3. Hypothesis testing: Blind test of cola normal v/s zero

We want to know if you can taste the difference between cola normal
and sugarless. To test this, we prepare 8 cups that look identical. Four
of the cups are filled with normal cola, the other four cups have cola
zero. The 8 cups are randomly shuffled using `sample.int(8)`

.
We write the shuffling order in a paper and hide it in an envelop that
you cannot see.

You test all of them and you write which ones you believe are cola normal and which ones are zero. For example you can write that cups 2,3,5 and 7 have cola zero. Then we open the envelop and compare your results to the original order, and we find that you guessed correctly all of them. Then we have two possible explanations:

- hypothesis zero: there is no difference between cola and zero, you just chose randomly and were lucky
- hypothesis one: you can tell the difference and you guess correctly

What is the probability of choosing correctly just by luck (i.e. under hypothesis zero)?

# 4. Theory: event frequency v/s event probability

- You want to do an experiment where the probability of an event is 0.70. How many replicas you need to guarantee that the relative frequency on the event in the experiment is between 0.65 and 0.75 at least 95% of the time? What is the formula to answer that question?
- You simulated a process with 100 replicas. The relative frequency of the event is 0.7. What is the 95% confidence interval for the real probability? What is the formula to answer that question?