December 17, 2019


Genomic databases

Every time a researcher publishes a paper about DNA sequencing, all the sequences are uploaded to online databases.

One of these databases is

It contains all DNA reads made with New Generation Sequencers

We want to understand how fast this database is growing


The plot must look like this

Database growth

  • The growth of databases is usually modeled as a semi-log linear model.
  • Build a semi-log model and find
    • what is the factor of growth per day?
    • What will be the database size two years after the last entry in your table?
  • write your code and comments in the same Rmarkdown file

Modeling free fall

Why we did this experiment

  • We need real data to produce science
  • Physics experiments are faster and cheaper than biology experiments
  • The logic is the same anyway
  • We care about the way of thinking

Example videos

YouTube has an option to show the video at 0.25 speed. Try it

  • In your computer, open a new text file and write the experimental data
  • three columns: msec, position, and replica
  • one row for each time the object crosses a mark

Summary of the video


Every serious experiment should be repeated at least 3 times

Without replication, it can be wrong

Replicas indicate that the result is repetible

Technical and biological replicas

In molecular biology we have two kinds of replicas

Technical replicas
the same organism, measured several times
Biological replicas
different organisms, measured at least one time

Technical and biological replicas

Technical replicas give you precise measurements of a single individual

This is what we do in Medicine

Biological replicas gives you data about one or more species

That is what we do in Science