Blog of Andrés Aravena
CMB2:

Homework after class 19

05 May 2017. Deadline: Monday, 15 May, 14:00.

This time we are going to use our tools to understand how cells use different codons for the same amino-acid.

As you know the same amino-acid can be encoded by several codons. You can see that on the genetic code or on the table SEQINR.UTIL$CODON.AA. Remember that you need to load the seqinr library to get this table.

  1. Write a function called codons_in_gene that takes a gene sequence (i.e. a vector of character) as input and returns a table of the frequency of each codon in the gene. If you do this:
library(seqinr)
genes <- read.fasta("https://anaraven.bitbucket.io/static/NC_000913.ffn")
codons_in_gene(genes[[1]])

you should get this:

aaa aac aca acc agc atc atg att cgc gcg ggc ggt tga 
  1   1   1   7   1   1   1   3   1   1   1   2   1 
  1. Write a function called codons_in_genome that takes a list of genes (such as the output of read.fasta) and returns the a table of the total frequency of each codon on all the genes in the genome (a numeric vector with names). If you do this:
codons_in_genome(genes)

you should get this:

  aaa   aac   aag   aat   aca   acc   acg   act   aga   agc 
44236 28319 13384 22756  8975 30972 18970 11577  2489 21131 
  agg   agt   ata   atc   atg   att   caa   cac   cag   cat 
 1363 11322  5345 33331 36700 40171 20208 12814 38152 16937 
  cca   ccc   ccg   cct   cga   cgc   cgg   cgt   cta   ctc 
11058  7138 30969  9128  4523 29301  6983 27843  5072 14702 
  ctg   ctt   gaa   gac   gag   gat   gca   gcc   gcg   gct 
70390 14403 52330 25214 23456 42135 26535 33898 44900 19999 
  gga   ggc   ggg   ggt   gta   gtc   gtg   gtt   taa   tac 
10216 39366 14464 32655 14325 20227 34796 24031  2678 16079 
  tag   tat   tca   tcc   tcg   tct   tga   tgc   tgg   tgt 
  287 21055  9154 11321 11747 10986  1178  8482 20060  6706 
  tta   ttc   ttg   ttt 
18085 21827 17992 29304 
  1. We want to know the relative frequency of each codon among the codons that code for the same amino-acid. For that, we need a function called codon_freq that takes an amino-acid aa (a single letter) and the output of codons_in_genome (that we can call total_freq), and returns the relative frequency of the codons corresponding to the amino-acid aa. This output is a numeric vector with names.

Deadline: Monday, 15 May, 14:00.