Blog of Andrés Aravena
CMB2:

# Homework 6

23 March 2020. Deadline: Friday, 27 March, 17:30.

Staying at home for long periods can affect your muscles. We need to exercise, keeping our muscles in shape. The same happens with our brain. Here there are some exercises to keep your brain in shape.

You can —in fact, you should— discuss the homework in the forum. Solving problems is a collective work. But answers should be individual, using the official template for answers. This way you also practice for the midterm exam, and for real life.

# Recycled questions

In Homework 5 we had two “long term” optional questions. We had extra time to think about them. Now it the time to answer them.

I will also recycle the advice I gave earlier.

• Be sure to understand the question. If you do not understand, ask in the forum. Explain what do you understand and what you do not understand.
• This is LEGO. Identify all the pieces and understand how they connect.
• Write the name of each variable. That is, the name of each input, output and auxiliary variables.
• Write what is the structure of each variable. Is it a vector, a list, a data frame, etc.?
• Write what is the type of each variable. Is it numeric, logic, character, etc.?
• You can only use the inputs and auxiliary variables that you create.
• If you did not create it, do not change it.
• The output should change if the input change. Check that each input is used somewhere.
• Understand what you have and what you want to have. It is like a biochemistry process. How do you get Leucine from ATP and water?
• Sometimes it is useful to work backwards. Start with what you want to have, and decompose it in simpler terms. For example, to get the GC-skew, you need to know nG and nC. The you just need to find nG and nC.
• Look at previous examples and recycle them, by using the old functions inside the new function.
• If you cannot recycle, then you can adapt the old code for the new case.
• It is always wise to return to our very first class: How to Solve It.

## 1. Algorithm design

In many important cases we have a vector x with growing values. That is, each value is bigger or equal to the previous one, so

x[i+1] >= x[i]

for all values of the index i. It is easy to see that the position of the minimum value has to be 1. We also know that the position of the maximum value is the last position. What about the position of the half value?

The half value is the average of the minimum and the maximum. For example if x is the vector c(1, 4, 4, 6, 10, 15) then the half value is (1+15)/2, that is 8.

Some people get confused between the values of a vector and the positions in a vector. The index indicates the position. Do not confuse them. This is important.

The position of the half value of the vector x is the index of the first value that is equal or bigger than the half value of x. In the example the position of the half value is 5, since x is the smallest value that is bigger or equal than 8.

Please write a function called position_of_half(), with one input called x. The function must return a single number, which is the index of the smallest value in x that is bigger than or equal to the average of minimum and maximum of x.

You can test your functions with the following code.

x <- 1:9
position_of_half(x)
position_of_half(x + 20)
position_of_half(x * x)
position_of_half(sqrt(x))

The answers should be 5, 5, 7, 4, respectively.

## 2. Merge two sorted vectors

Please write a function called vector_merge(x, y) that receives two sorted vectors x and y and returns a new vector with the elements of x and y together sorted. The output vector has size length(x)+length(y).

You must assume that each of the input vectors is already sorted.

in your code you have to use three indices: i, j, and k; to point into x, y and the output vector answer, respectively. On each step you have to compare x[i] and y[j]. If x[i] < y[j] then you make answer[k] <- x[i], otherwise make answer[k] <- y[j].

You have to increment i or j, and k carefully. To test your function, you can use this code:

x <- c("a", "d", "e", "h", "i", "k", "m", "s", "t", "u", "v", "w", "z")
y <- c("b", "c", "f", "g", "j", "l", "n", "o", "p", "q", "r", "x", "y")
vector_merge(x, y)

The output must be a sorted alphabet.

"a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m"
"n" "o" "p" "q" "r" "s" "t" "u" "v" "w" "x" "y" "z"

# New questions

These questions are easier than the previous one. You may want to start with these.

## 3. Transcription

Write a function called transcribe(), that takes a DNA sequence (a vector of character) and returns the corresponding RNA sequence.

In other words, if you have

dna <- c("T","C","A","G","A","T","T","A","C")

then we transcribe it with transcribe(dna). The result should be

"U" "C" "A" "G" "A" "U" "U" "A" "C"

## 4. Codons

Write a function called codons(), that takes a RNA sequence (a vector of character) and returns a list of vectors. Each vector in the list must have only 3 letters, representing the codon.

## 5. Translation

Write a function called translate_codon(), that takes a codon (a vector with 3 letters), and returns a single letter representing one amino acid. For your convenience, here you have the correspondence between codons and amino acids.

What are the “*“?

CODON L CODON L CODON L CODON L
aaa K caa Q gaa E taa *
aac N cac H gac D tac Y
aag K cag Q gag E tag *
aat N cat H gat D tat Y
aca T cca P gca A tca S
acc T ccc P gcc A tcc S
acg T ccg P gcg A tcg S
act T cct P gct A tct S
aga R cga R gga G tga *
agc S cgc R ggc G tgc C
agg R cgg R ggg G tgg W
agt S cgt R ggt G tgt C
ata I cta L gta V tta L
atc I ctc L gtc V ttc F
atg M ctg L gtg V ttg L
att I ctt L gtt V ttt F

## 6. Reverse complement

This function can be decomposed in two parts:

• reverse(), taking a vector and returning it backwards
• complement(), replacing each letter in the vector by its complement

Write a function called reverse_complement(), that takes a DNA sequence (a vector of character) and returns the DNA sequence of the opposite strand. Both sequences are represented from 5’ to 3’.

In other words, if you have

dna <- c("T","C","A","G","A","T","T","A","C")

then applying reverse_complement(dna) we should getThe complement of “A” is “T”, the complement of “C” is “G”, and vice-versa.

"G" "T" "A" "A" "T" "C" "T" "G" "A"

Stay safe, work at home, do the homework.