Blog of Andrés Aravena
Course Homepage:

# Computing in Molecular Biology 2

## Introduction to Data Science

13 February 2018

This course is an introduction to Quantitative Thinking. We will use the tools we learned in the previous course and apply them to model real data and to simulate scientific experiments as a way to understand them.

The forum of the course is at https://groups.google.com/d/forum/iu-cmb. You can also participate writing an email to .

## Homework

All quizzes and homework should be sent to andres.aravena+cmb@istanbul.edu.tr before the deadline to get a grade. Please be careful, otherwise you will get a grade zero.

• Class2: Quiz 1. Turtle graphics. (Deadline: Friday 16 of February at 17:00).
• Homework 2. Functions in R. (Deadline: Wednesday 28 of February at 9:00).
• Class 6: Quiz 2. Rabbits, thin people and trees. (Deadline: Friday 2 of March at 17:00).
• Homework 3 (Deadline: Wednesday 18 of April at 9:00).
• Homework 4. How many persons with epilepsy in our course?. (Deadline: Wednesday 25 of April at 9:00).
Epilepsy affects 1% of world population. What does that mean for us in this course? Can we have everybody with epilepsy?
• Exercises for Final Exam. Prepare yourself for the exam with practice and perseverance. (Deadline: Exam day, of course).

## Classes

Here you find the slides that have been used in classes. Notice that usually they are not published immediately, so you better take good notes. We recommend taking notes with pen and paper using the Cornell Method.

• Class 1: Introduction to Scientific Computing. (Feb 14, 2018). Motivation of the course [Slides].
• Class 2: Quiz 1. (Feb 16, 2018). Turtle graphics [Document].
• Class 3: Turtle Graphics. (Feb 21, 2018). From Scratch to R [Slides].
• Class 4: Decomposition, Patterns, Abstraction, Algorithms. (Feb 23, 2018). Functions: a key element of Computational Thinking [Slides].
• Class 5: Step by Step into Functions of Functions. (Feb 23, 2018). Using RStudio Debugger, and something about recursive functions [Slides].
• Class 6: Quiz 2. (Mar 2, 2018). Rabbits, thin people and trees [Document].
• Class 7: Learning from our mistakes. (Mar 7, 2018). Taking lessons from the quiz answers. Drawing genomic data [Slides].
• Class 8: Genomic data. (Mar 9, 2018). Working with DNA sequences [Slides].
• Class 9: Local and Global Statistics on DNA. (Mar 14, 2018). Finding the Origin of Replication [Slides].
• Class 10: Cumulative sums. (Mar 16, 2018). And an introduction to Systems Theory [Slides].
• Class 11: Dynamic Systems. (Mar 21, 2018). Making water is like growing cells [Slides].
• Class 12: Quiz 3. (Mar 23, 2018). Rehearsal for midterm [Document].
• Class 13: Can we Predict the Future?. (Apr 11, 2018). Deterministic and Non-deterministic Systems. Chaos and randomness [Slides].
• Class 14: Probabilities. (Apr 13, 2018). Basic definitions and concepts [Slides].
• Class 15: Probability distributions. (Apr 18, 2018). Also, solution of the homework [Slides].
• Class 16: Quiz 4. (Apr 20, 2018). Practice of Decomposition, Pattern Matching, Abstraction and Algorithm Design [Document].
• Class 17: Events. (Apr 25, 2018). Confidence intervals, and answer to homework 4 [Slides].
• Class 18: Practical Simulations. (Apr 27, 2018). Is this variation caused by chance or something else? [Slides].
• Class 19: More Simulations. (May 2, 2018). Defining p-value and Run Length [Slides].
• Class 20: Experiment Design. (May 4, 2018). Simulating to plan a DNA sequencing project [Slides].
• Class 21: Genetic algorithms. (May 9, 2018). Solving hard questions using Nature’s ideas. Also, end of experimental design [Slides].
• Class 22: Practicing Genetic algorithms. (May 11, 2018). Solving hard problems step by step [Slides].
• Class 23: Summary of the course. (May 16, 2018). The big picture. Why we did all of this [Slides].
• Class 24: Quiz 5. (May 18, 2018). Practice for Exam [Document].

## Attendance

By regulation from the Rectory, students need to attend at least 70% of the classes. The attendance book is updated every week and can be seen in Google Sheets.

## References

• Polya, G. and Conway, John H. How to Solve It: A New Aspect of Mathematical Method. Princeton Science Library.

• Wilson et al. “Best Practices for Scientific Computing.” PLoS Biology 12,1 (2014).

• Stefan et al. “The Quantitative Methods Boot Camp: Teaching Quantitative Thinking and Computing Skills to Graduate Students in the Life Sciences”. PLoS Comput. Biol. 11, 1–12 (2015).

• Elson D, Chargaff E (1952). On the deoxyribonucleic acid content of sea urchin gametes. Experientia 8 (4): 143–145.

• Chargaff E, Lipshitz R, Green C (1952). Composition of the deoxypentose nucleic acids of four genera of sea-urchin. J Biol Chem 195 (1): 155–160.

• Roten C-AH, Gamba P, Barblan J-L, Karamata D. Comparative Genometrics (CG): a database dedicated to biometric comparisons of whole genomes. Nucleic Acids Research. 2002;30(1):142-144.

• Zeeberg, Barry R, Joseph Riss, David W Kane, Kimberly J Bussey, Edward Uchio, W Marston Linehan, J Carl Barrett, and John N Weinstein. Mistaken Identifiers: Gene Name Errors Can Be Introduced Inadvertently When Using Excel in Bioinformatics. BMC Bioinformatics 5 (2004): 80. doi:10.1186/1471-2105-5-80.

• Babylonian astronomers computed position of Jupiter with geometric methods January 29, 2016 https://phys.org/news/2016-01-babylonian-astronomers-position-jupiter-geometric.html

• Ossendrijver, Mathieu. Ancient Babylonian Astronomers Calculated Jupiter’s Position from the Area under a Time-Velocity Graph. Science (New York, N.Y.) 351, no. 6272 (January 29, 2016): 482–84. doi:10.1126/science.aad8085.