This course is new, different from the previous year.
Things to do:
- Register in the Forum
- Answer the Survey
- Install the required software
- Read the class slides
- Watch the classes videos
This course is an introduction to Data Science for students of Molecular Biology. We use the R language to learn the basic tools to handle structured data and extract valuable scientific information from it.
All quizzes and homework should be sent to (email@example.com) before the deadline to get a grade. Please be careful, otherwise you will get a grade zero.
Homework 1 (Deadline: Monday 2 of November)
Homework 2 (Deadline: Monday 16 of November at 11:00).
Practice for midterm exam. Vectors, indices, and general ideas about using R.
Homework 3 (Deadline: Monday 23 of November at 11:00).
Practice for midterm exam. Vectors and indices, again.
Homework 4 (Deadline: Monday 30 of November at 11:00).
Practice for midterm exam. Tidying up real-life data.
Homework 5 (Deadline: Monday 11 of January at 11:00).
Practice for final exam. Plots and linear models.
Here you will find the slides from the classes and other supplementary material. Notice that some things are said but not written, so you better take good notes. We recommend taking notes with pen and paper using the Cornell Method.
- Class 1: Why “Computing in Molecular Biology”?.
(Oct 19, 2020). [Video],[Slides].
What is a computer? Why do we care?
- Class 2: What is a computer?. (Oct 19,
What is a computer? How do they represent information?What are its parts and how do they interact?
- Class 3: Folders and Files. (Oct 19,
What is a computer? What are its parts and how do they interact?
- Class 4: Structured Documents. (Oct 26,
Introduction to Rstudio and to Markdown.
- Class 5: Structured text files with Markdown.
(Oct 26, 2020). [Video],[Slides].
Structured text files.
- Class 6: Using R and RStudio. (Oct 26,
Basic usage of RStudio. Introduction to R.
- Class 7: Learning R Language. (Nov 2,
Introduction to R Language.
- Class 8: Vectors. (Nov 2, 2020).
Handling several values in a single variable.
- Class 9: More about Rmarkdown. (Nov 2,
Headers, metadata, YAML, bibliographic references, footnotes, equations. How to share error messages and ask for help. Other systems of structured text: HTML, LaTeX.
- Class 10: Logic and Text vectors. (Nov 9,
Things that are not numbers.
- Class 11: Indices. (Nov 9, 2020).
Use a vector to look inside another vector and change it.
- Class 12: Practice with vectors in R. (Nov 9,
Using COVID data.
- Class 13: Tibbles and data frames. (Nov 16,
Reading data from the real world. New way of doing data frames. Combining filters.
- Class 14: Digital Signatures. (Nov 16,
Do I have your homework?
- Class 15: Practice with Tibbles. (Nov 16,
Practice makes perfect
- Class 16: Telling stories. (Nov 23, 2019).
Telling stories about data. Many averages, quartiles, quantiles, percentiles, summary. Introduction to Descriptive Statistics.
- Class 18: Practice on tidying up data. (Nov 23,
A protocol to get useful data from bad data.
- Class 19: Practice, practice, practice. (Nov
30, 2020). [Video],[Slides].
Get ready for midterm exam.
- Class 20: Factors. (Dec 21, 2020).
Not numbers, not text. Categorical variables handle a controlled vocabulary to avoid errors and simplify the analysis.
- Class 21: Exploring data with plots. (Dec 21,
There are several ways to do graphics in R. In this class we will show the basic graphic commands.
- Class 22: Linear models. (Dec 28, 2020).
Finding the best fitting straight line. See also:
- Class 23: Logarithmic scales. (Dec 28,
Like Robin Hood, logarithms enlarge small values and reduce big values. Logarithmic scales are microscopes and telescopes at the same time. What does it mean for you? See also:
- Class 24: Logarithmic models. (Jan 4,
Not all lines are straight lines. But they can be straight if you change your perspective. See also:
- Class 25: Polynomial Models. (Jan 4,
Not all lines are straight lines, or exponentials, or power laws. See also:
- Class 26: Drawing beautiful plots. (Jan 11,
The grammar of graphics. See also:
- Class 27: More beautiful plots. (Jan 11,
The grammar of graphics. Multiple lines. See also:
- Class 28: Linear models with factors. (Jan 18,
Useful for gene expression analysis. See also:
- Solution of Midterm exam.
- Class 17 for November 23 by RStudio authors. “Data Manipulation Tools: dplyr – Pt 3 Intro to the Grammar of Data Manipulation with R (RStudio)”
- Manipulating and exploring data with dplyr (Hefin Rhys)
By regulation from the Rectory, students need to attend at least 70% of the classes. The attendance book is updated every week and can be seen in Google Sheets.
To be counted as present, you must answer a Quiz during the class time and deliver all homework on time. Late submissions will not be accepted.
Some Online Resources
- How to read an R help page
- Getting Started with R
- Free Course: Introduction to R
- Introduction to Data Science
- Book R for Data Science
- Book Data Visualization: A practical introduction by Kieran Healy, Duke University
- Getting started with R and RStudio (14:38)
- Introduction to R and RStudio (1:31:20)
- Introduction to R and RStudio part 2 (1:27:24)
- Introduction to R Markdown (26:26)
- Learn The Basics Of Markdown in 10 Minutes With This Video Tutorial (10:55)
- Getting Started with RStudio at Amherst: First steps with R Markdown (03:50)
- R Markdown (13:59)
- R Markdown with RStudio (06:39)
- How to Create an R Markdown file (13:51)
- KSL003 Markdown for (Bio)Scientists (14:40)
- How to Create Professional Presentations using Markdown without Powerpoint or Keynote (06:15)
- Academic Writing in Markdown (10:45)
- Markdown Crash Course (19:31)
- R Markdown The bigger picture - Garrett Grolemund (18:52)
For this course we will use the new version of R and Rstudio. These two tools work together. Install R first, then install Rstudio.
These videos may help you
Once you have registered for the forum, you should send all your questions by email to firstname.lastname@example.org. You can also write your questions, and check for previous messages, on the web page https://groups.google.com/d/forum/iu-cmb.
Everybody must register to the course forum. To do
so you must fill the Welcome survey and register your email
address. I recommend that you use a gmail account, such as the
ogr.iu.edu.tr email service provided by the university. In
any case, use an email address that you check regularly. We will use
this forum to send important material.
This semester we will carry on the course online. That is an interesting challenge, since it makes some things harder but others simpler. To start, everybody should read this paper:
Searls DB. “Ten simple rules for online learning”. PLoS Computational Biology. 2012;8(9):e1002631. DOI: 10.1371/journal.pcbi.1002631. Epub 2012 Sep 13. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3441493/
Frey, Carl Benedikt, and Michael A Osborne. “The Future of Employment: How Susceptible Are Jobs to Computerisation?” Technological Forecasting and Social Change 114 (January 2017): 254–80. https://doi.org/10.1016/j.techfore.2016.08.019.
Nuzzo, Regina. “How Scientists Fool Themselves – and How They Can Stop.” Nature 526, no. 7572 (2015): 182–85. https://doi.org/10.1038/526182a.
Polya, G. and Conway, John H. “How to Solve It: A New Aspect of Mathematical Method.” Princeton Science Library.