Blog of Andrés Aravena
Course Homepage:

# Computing in Molecular Biology 2

## Introduction to Computational Thinking

09 February 2020

This course is an introduction to Computational Thinking. We will use the tools we learned in the previous course and apply them to model and simulate scientific experiments as a way to understand them.

Molecular Biology is going around computing and informatics in these days. Obtaining data is easy and cheap, processing data is hard and expensive to do and learn. A molecular biologist have to be able to understand what he/she produced. Otherwise he/she is a pipetting robot. Informatics and computing is what you have to learn, work harder.

Kaan İhsan Eşkut

Computing in Molecular Biology course was really hard to understand for us. In first year I failed with FF, then next year I passed with AA. I was able to pass when I understand the purpose of the computational methods, how important they are and how can we use it in molecular biology.

Eda Şamiloğlu

The course content and problems are very educational for beginner students but the main problem is that they have no perspective about computational sciences they have just thinking passing exam and move on. Everything about the course is depends on students behaviour and I think the lecturer makes a great effort for teaching so if students do not want to learn they will lose themselves.

Faruk Üstünel

This course was so interesting for me. I really didn’t like computers and I didn’t know anything about course and programing skills. This course was so much useful for me. I could understand your expression and your body language, but it was so fast for me because everything was new and hard for me, so understanding was hard.

Melis Ataözü

Tabi ki bu dersleri seçerken en önemli etken hepimizin bildiği bir gerçek, moleküler biyolojide bilgisayarın önemidir. Yine bildiğimiz gibi, yaptığımız deneyler, elde ettiğimiz veriler , düzgün bir şekilde analiz edilip anlamlı bir çıkarıma dönüştürülmediği sürece hiçbir önem arz etmemektedirler. Şimdi elbette tercih sizlerin ancak bu derslerin bizler için çok önemli olduğunu göz önüne alarak ve tahmin ettiğim üzere okulda dolaşan korkulu senaryoları bir kenara bırakarak, karar vermeniz sizlerin yararınıza olacaktır

Yaşar Özge Delikkafa

This page will be updated during the semester. Please check it regularly. It was last updated on June 2 at 13:59.

The forum of the course is at https://groups.google.com/d/forum/iu-cmb. You can also participate writing an email to . Everybody must register to the course forum. I recommend that you use a gmail account, such as the ogr.iu.edu.tr email service provided by the university. In any case, use an email address that you check regularly. We will use this forum to send important material.

## Sequences for exercises

Most times you will use sequences that we find at NCBI. For exercises, we can use these sequences:

• Candidatus Carsonella ruddii PV DNA
• Escherichia coli str. K-12 substr. MG1655

## Homework

All quizzes and homework should be sent to andres.aravena+cmb@istanbul.edu.tr before the deadline to get a grade. Please be careful, otherwise you will get a grade zero.

• Homework 1 (Deadline: Wednesday 19 of February at 8:30).
Take a complex drawing and decompose it in many smaller parts
• Homework 2 (Deadline: Wednesday 26 of February at 8:30).
Peer-review of first exercise. Learn to recognize good and bad decompositions. Learn to give feedback. Train to be a referee.
• Homework 3 (Deadline: Tuesday 3 of March at 8:30).
This week we will practice decomposition, pattern matching, and algorithm design by creating an R script to draw simple figures. And we learn the discipline to deliver homework always on time, even if it is not ready.
• Homework 4 (Deadline: Tuesday 10 of March at 8:30).
We jump into functions, the key piece for the rest of the course. We start with one question in four versions. If you look carefully, they are all essentially the same question, so if you solve the first one, the rest should be easy.
• Homework 5 (Deadline: Thursday 19 of March at 12:30).
Practice of functions, for DNA analysis and for fun. Getting ready for the midterms.
• Homework 6 (Deadline: Friday 27 of March at 17:30).
Exercise your brain for the midterms. Using the computer to handle DNA data safely at home.
• Homework 7 (Deadline: Monday 6 of April at 17:30).
Keep your mind sharp. Prepare for the next stage of online learning. Learn how to find biological meaning in genomic data —codon usage, GC-skew— using basic programming tools.
• Homework 8 (Deadline: Tuesday 21 of April at 8:30).
Explain in English what will the system do.
• Homework 9 (Deadline: Tuesday 28 of April at 8:30).
Explain in English what are the parts of each of these systems, and what do you think will be the systems’ behavior.

## Classes

Here you find the slides that have been used in classes. Notice that usually they are not published immediately, so you better take good notes. We recommend taking notes with pen and paper using the Cornell Method.

• Introduction to Computational Thinking. (Feb 11, 2020).
Motivation of the course. Programing is as easy as drawing.[Slides].

• Programing is as easy as LEGO. (Feb 12, 2020).
First steps in Scratch.[Slides].

• Decomposition. (Feb 18, 2020).
Practice with Scratch.[Slides].

• Turtle Graphics. (Feb 19, 2020).
Computational thinking. Patterns. Loops. Abstraction.[Slides].

• Patterns, Loops. (Feb 25, 2020).
From Scratch to R. Loops to repeat, functions to reuse. How to make your own functions.[class05.R], [Slides].

• Patterns, Functions. (Feb 26, 2020).
Find patterns, build functions to recycle code. How to make your own functions.[class06.R], [Slides].

• The missing parts. (Mar 3, 2020).
In class time we show the most important slides. Here you have the “extra” slides, which we have not shown but which are useful anyway: R versus Rmd – Using the Script editor – Step by Step using RStudio Debugger – Opening the “black box” – Breakpoints, Environment, Traceback – Practical usage of stick people.[Slides].

• Separation of concerns. (Mar 3, 2020).
Generalize the solution. Use functions to simplify a complex problem.[Slides].

• Creating Functions, Making Decisions. (Mar 4, 2020).
Use functions to simplify a complex problem. Decide when to do something and when not to do it.[class08.R], [Slides].

• Using R to handle DNA sequences. (Mar 10, 2020).
Proteins and DNA are sequences, that can easily be handled by the computer. Find them on the NCBI website, by accession number or taxonomy. We use FASTA format and we read it in R.[Slides].

• Functions to understand DNA. (Mar 11, 2020).
Using R to understand the genome, the genes and their interactions.[class10.R], [class-extra.R], [Slides].

• Testing Homework 5. (Mar 23, 2020).
Use these examples to test if your answers to Homework 5 are correct.[Document].

• Finding the Replication Origin. (Apr 7, 2020).

• Introduction to Systems Theory. (Apr 14, 2020).
Applying accumulative sums to understand how things change. Including pandemics.[Slides].

• Examples of Systems. (Apr 17, 2020).
Practice explaining drawing.[Slides].

• Systems in Biology and Beyond. (Apr 21, 2020).
We can describe systems as parts and interactions, and simulate their emergent behavior.[Slides].

• Analyzing systems. (Apr 28, 2020).
If we can describe a system, we can simulate it. Behavior can be more complex than you think.[Video Recording], [Chat], [Example code], [Slides].

• Long-term behavior and effect of initial conditions. (May 5, 2020).
Simulating systems and understanding their results.[Video Recording], [Chat], [Slides].

• Patterns in patterns in patterns…. (May 12, 2020).
In order to understand recursion, one must first understand recursion.[Video Recording], [Chat], [Example Code], [Slides].

• Practice of recursive functions. (May 19, 2020).
.[Slides].

## Attendance

By regulation from the Rectory, students need to attend at least 70% of the classes. The attendance book is updated every week and can be seen in Google Sheets.

## References

• Polya, G. and Conway, John H. How to Solve It: A New Aspect of Mathematical Method. Princeton Science Library.

• Stefan, M. I., Gutlerner, J. L., Born, R. T. & Springer, M. “The Quantitative Methods Boot Camp: Teaching Quantitative Thinking and Computing Skills to Graduate Students in the Life Sciences”. PLoS Comput. Biol. 11, 1–12 (2015). doi:10.1371/journal.pcbi.1004208.

• Wilson, G., D. a. Aruliah, C. T. Brown, N. P. Chue Hong, M. Davis, R. T. Guy, S. H. D. Haddock, et al. “Best Practices for Scientific Computing.” PLoS Biology 12, no. 1 (2014): e1001745. doi:10.1371/journal.pbio.1001745.

• Noble, William Stafford. “A Quick Guide to Organizing Computational Biology Projects.” PLoS Computational Biology 5, no. 7 (2009): 1–5. doi:10.1371/journal.pcbi.1000424.

• Elson D, Chargaff E (1952). On the deoxyribonucleic acid content of sea urchin gametes. Experientia 8 (4): 143–145.

• Chargaff E, Lipshitz R, Green C (1952). Composition of the deoxypentose nucleic acids of four genera of sea-urchin. J Biol Chem 195 (1): 155–160.

• Roten C-AH, Gamba P, Barblan J-L, Karamata D. Comparative Genometrics (CG): a database dedicated to biometric comparisons of whole genomes. Nucleic Acids Research. 2002;30(1):142-144.

• Zeeberg, Barry R, Joseph Riss, David W Kane, Kimberly J Bussey, Edward Uchio, W Marston Linehan, J Carl Barrett, and John N Weinstein. Mistaken Identifiers: Gene Name Errors Can Be Introduced Inadvertently When Using Excel in Bioinformatics. BMC Bioinformatics 5 (2004): 80. doi:10.1186/1471-2105-5-80.