To explain question 2.4 I will show an example. First, please take a look at the description from the original question. Here we have numbered the steps, so we can talk about them.
- For each amino-acid AA:
- Find the codons that encode AA it and store them in a
- Find the maximum of
total_codon_countamong the codons in
codons_for_aa. Look only at these codons, ignore the rest. Store this maximum in
- For each codon in
- Find the value of
total_codon_countfor that particular codon.
- Divide it by
- Calculate the logarithm of the result of the last division, and
store it in
codon_adaptnessusing the codon as the index.
- Repeat for each codon in
- Find the value of
- Repeat for each amino acid.
- Find the codons that encode AA it and store them in a vector called
If you translate this description from English to R, then you will have the answer.
From this description you can see that there are two
for() loops. They are nested, one inside the other.
Remember that in this case you must use different names of variables on
each loop. For example
j —or any other
word. You cannot use
i, because the
computer will be confused. It is a good idea to choose names that remind
you of the things you are looking at (amino-acid, codon).
The key of this question is to use
genetic_code as a
tool, not as an problem. Some people tries to use
—a command that we never used in classes— to transform the list into a
vector. But in that case you get all the codons, not only the
ones for a specific amino acid. Too complicated, and wrong.
names(genetic_code) has all the amino-acid’s
names. Each element of the list
genetic_code is a vector,
containing only the codons of the corresponding amino-acid. If
aa is the name of an amino-acid, then
genetic_code[[aa]] will be a vector containing the codons
The first loop must take each amino-acid name, one by one. It will go
"Asn", up to
total_codon_count can be indexed using the
codon names. Thus, if you use
codons_for_aa as an index for
total_codon_count, then you will be working only among
the codons in
codons_for_aa, as requested.
Let’s follow the instructions step by step.
Step 1 says “Find the codons that encode one amino-acid”.
For example, let’s say that the main
for() loop has taken
the amino-acid name
"Val". Using this name as an index for
genetic_code, we can find that the codons for this
"gta", "gtc", "gtg", "gtt". We store these
values in the vector
codons_for_aa. Obviously we do not do
this by hand. Instead you tell the computer to do it for us.
Step 2 says “Find the maximum of
among the codons in
codons_for_aa.” Let’s say that
there are 560 valines in total (in all the genes). We use
codons_for_aa as index for
and get the count of each codon:
gta gtc gtg gtt 410 110 10 30
This small vector is what we call “only the codons of valine”. We ignore for now the rest of the codons, since in this part we are working only with valine.
The maximum value of this small vector is 410. We store that value in
max_count_for_aa. We finished step 2.
In step 3 we have a second
for() loop, going through the
codons_for_aa one by one. Let’s say that the
"gtc" (we work on the other codons later). In step
3.1 we use this codon as an index to look inside
total_codon_count and we find the value 110. That is the
In step 3.2 we divide that value by
get 110/410 or in other words 0.2682927. In step 3.3 we calculate
log(110/410) and we get -1.315677. We store this calculated
value into the corresponding element of vector
codon_adaptness. We have to be careful to use the correct
At the end, as usual, it is all about using the correct index in the correct order.
We do this for each codon of each amino-acid in
(Notice that this is not the relative frequency. We are calculating something else here, so the formula is something else.)