Blog of Andrés Aravena

Computing in Molecular Biology and Genetics I

27 April 2015

On April 27 my boss asked me to describe the classes I’m going to teach starting the Fall 2015 semester. The first one, named “Computing in Molecular Biology and Genetics I”, replaces the former course “Computation III”. The old course used to teach databases and SQL on the Microsoft Access platform.

In my view, based on collaboration with molecular biologist for over 12 years, the key ideas that a researcher needs to know are not emphasized in these particular tools. I think that the key ideas are:


Ders Adı (Course Name)
Computing in Molecular Biology and Genetics I
Course Adaptation (I don’t know what this means)
Compulsory Attendance
Theory: 70, Practice: 80, Lab: 0 (I don’t know what this means)
Course Teacher(s)
Andres Aravena
The students will learn how to handle experimental data and how to communicate it to scientists of other data-oriented disciplines.
Course Learning Outcomes
Students will learn how to produce publication quality reports with reproducible results. How to get raw data, extracting relevant information, filter it using several selection criteria. How to store and retrieve it in efficient and useful ways. How to transform it, organize it, categorize it, display, show and understand the results. Tools include Unix command line tools, SQL and the R statistical package. The student should be able to understand how computer networks work and what are their limitations.
Course Content (Short Description)
This course teaches the basic elements of data handling techniques in modern computing systems that allow the management and understanding of experimental results.
Teaching and Learning Methods
Exposition of theoretical aspects, followed by practical exercises by the students. Student work is encouraged.
Online resources about R. There are plenty of them.
Course Material (Auxiliary Equipment, modal etc.)
Slide Presentations, Data Show, Blackboard, computer network
Continuous Improvement in the Context of the courses (questionnaires, interviews, and so on.) Front Shown Measurement and Evaluation Tools and Objectives
Quiz, Project working and homework are used effectively to improve the course. At the end of Mid-term and Final Exams we will report which are the questions with the lower number of correct answers between the students with high attendance rate. According to this results lecture notes are updated and additional homework activities are recommended.

Theory Topics

Week Weekly Contents Period
1 Definitions: computer, data, digital, structured. 2
2 Introduction to R. Data types, structures 2
3 Reading and writing files, applied to molecular biology 2
4 Querying and selecting data subsets 2
5 Merging different data sources. Table crossing 2
6 Structured programming for data handling 2
7 Data visualization for research 2
8 Structured documents and reproducible research reporting 2
9 Introduction to Unix command line tools 2
10 Introduction to SQL data management 2
11 Introduction to computer networks and Internet 2
12 Advanced data handling in R 2
13 Advanced data visualization in R 2
14 Introduction to parallel computing 2

Practice Topics

Week Weekly Contents Period
1 Examples of data files: JPG, PNG, TXT, CSV, XLS, DOC 2
2 Examples of R data structures 2
3 Handling microarray and genomic data files 2
4 Examples of select and filter on data frames 2
5 Examples of inner/outer join on data frames 2
6 FOR loops, IF structures, Functions 2
7 Examples of Plots, Histograms, Barplots 2
8 Examples of Separation of Content and Style 2
9 Examples of Unix command line tools 2
10 Examples of SQL queries 2
11 Description of modern network elements 2
12 Examples of data melting, casting and filters 2
13 Examples of grammar of graphics 2
14 Examples of multicore mappings 2