Blog of Andrés Aravena

Computing in Molecular Biology 2

17 June 2016

This course is an introduction to Quantitative Thinking. We will use the tools we learned in the previous course and apply them to model real data and to simulate scientific experiments as a way to understand them.

We had 12 lessons, which can be seen here:

  1. Structured documents and Rmarkdown. Slides
  2. Structured data: Entity-Relationship models and GEO. Slides, Homework
  3. Finding Structures: Clustering. Slides, Homework
  4. Reading structured genomic data. Also the last part of clustering. Slides, Homework
  5. Statistical properties of genomic sequences. Slides, Homework
  6. Genetic code. Genes and protein sequences. Slides, Homework
  7. ORF, CDS, Transcription Factor Binding Sites. Slides, Homework
  8. Midterm Exam rehersal. See previous week homework and Google Docs (read-only, but open to comments)
  9. Review of the Midterm Exam. Slides
  10. Pairwise Sequence Alignment of DNA sequences. Slides
  11. Pairwise Sequence Alignment of protein sequences. Introdiction to sequence assembly. Based on student’s homeworks.
  12. Simulation for experimental design. Example on sequence alignment Slides.

These slides are complemented with the slides prepared by the students. Unfortunately I cannot publish them because none of the students (i.e. the copyright holders) authorized it. I’m afraid this may have a negative impact on student’s performance, but they are adults and I respect their decisions.

For the final exam only we only allow a summary of the course content in one (1) single original handwritten A4 page.

Please notice that the slides may not be uploaded immediately after the class. It is strongly recommended that you take your own notes with pen and paper.


We also have weekly graded homework. Volunteers are welcome, otherwise presenters are chosen randomly.

The current grades can be seen on google spreadsheets.


These exercises are not graded but help you to understand the subject and learn effectively.

All the current and previous exercises have their own web page, which is useful to prepare for the exam.


Polya, G. and Conway, John H. How to Solve It: A New Aspect of Mathematical Method. Princeton Science Library

Wilson et al. “Best Practices for Scientific Computing.” PLoS Biology 12,1 (2014)

Stefan et al. “The Quantitative Methods Boot Camp: Teaching Quantitative Thinking and Computing Skills to Graduate Students in the Life Sciences”. PLoS Comput. Biol. 11, 1–12 (2015).

Elson D, Chargaff E (1952). On the deoxyribonucleic acid content of sea urchin gametes. Experientia 8 (4): 143–145

Chargaff E, Lipshitz R, Green C (1952). Composition of the deoxypentose nucleic acids of four genera of sea-urchin. J Biol Chem 195 (1): 155–160

Roten C-AH, Gamba P, Barblan J-L, Karamata D. Comparative Genometrics (CG): a database dedicated to biometric comparisons of whole genomes. Nucleic Acids Research. 2002;30(1):142-144

Zeeberg, Barry R, Joseph Riss, David W Kane, Kimberly J Bussey, Edward Uchio, W Marston Linehan, J Carl Barrett, and John N Weinstein. Mistaken Identifiers: Gene Name Errors Can Be Introduced Inadvertently When Using Excel in Bioinformatics. BMC Bioinformatics 5 (2004): 80. doi:10.1186/1471-2105-5-80.