This course is an introduction to Data Science for students of Molecular Biology. We use the R language to learn the basic tools to handle structured data and extract valuable scientific information from it.
This is the plan for teaching, based on the previous year. Here you will find the slides from the classes and other supplementary material. Notice that some things are said but not written, so you better take good notes. We recommend taking notes with pen and paper using the Cornell Method.
Why “Computing in Molecular Biology”?. (Sep 17, 2018).
What is a computer? Why do we care?[Slides].
Representing things with numbers. (Sep 20, 2018).
Memory, Files and Documents.[Slides].
Structured Documents. (Sep 24, 2018).
Introduction to Rstudio and to Markdown.[Slides].
Using R and RStudio. (Sep 27, 2018).
Basic usage of RStudio. Introduction to R. Basic Data Types: Numeric, Character, Logic and Factor.[Slides].
Welcome to the Matrix. (Oct 11, 2018).
Structures in two dimensions. Matrices and Data Frames.[Slides].
Telling stories. (Oct 18, 2018).
Introduction to Descriptive Statistics.[Slides].
Quiz 2. (Oct 22, 2018).
Second rehearsal for Midterm Exam[Document].
Data Visualization. (Nov 12, 2018).
Telling stories with pictures. “One image worths a thousand words”. Plots, barplots, histograms. Making “nice” drawings. Adding points and lines.[Slides].
More Data Visualization. (Nov 15, 2018).
Plotting two vectors, numeric or factor. Formulas.[Slides].
Quiz 3. (Nov 19, 2018).
Practice plotting data[Document].
Subsets and formulas. (Nov 22, 2018).
Easier ways to plot. Also, introduction to Linear Models.[Slides].
Logarithmic scales. (Nov 29, 2018).
Not all lines are straight lines. Exponential growth in Science and Technology. What will be your future?[kleiber.txt], [Transistor_count.txt], [dna_price.txt], [Slides].
Polynomial Models. (Dec 6, 2018).
Not all lines are straight lines.[Slides].
Some Free Online Resources about R
Polya, G. and Conway, John H. How to Solve It: A New Aspect of Mathematical Method. Princeton Science Library.
Zeeberg, Barry R, Joseph Riss, David W Kane, Kimberly J Bussey, Edward Uchio, W Marston Linehan, J Carl Barrett, and John N Weinstein. Mistaken Identifiers: Gene Name Errors Can Be Introduced Inadvertently When Using Excel in Bioinformatics. BMC Bioinformatics 5 (2004): 80. doi:10.1186/1471-2105-5-80.