September 27th, 2016

I am Andres Aravena

  • Assistant Professor at Molecular Biology and Genomics Department
  • Mathematical Engineer, U. of Chile
  • PhD Informatics, U Rennes 1, France
  • PhD Mathematical Modeling, U. of Chile
  • not a Biologist
  • but an Applied Mathematician who can speak “biologist language”

I’ve worked on

  • Big and small computers
  • Telecommunication Networks
  • Between 2003 and 2014 I was the chief research engineer
    • on the main bioinformatic group in my country
    • in the top research center
    • in the top university of my country

I come from Chile

world

Chile

chile

Small country of ~17 million people

Spanish colony 500 years ago (so language is Spanish)

Independent Republic 200 years ago

First Latin American country to recognize Turkish republic

Everyday life very similar to Turkey

Chileans like Turkish soap operas

binbirgece

Latin America in Turkey

Foreigners enrich the hosting countries. Just look at the food:

  • Corn is from North and South America: maiz
  • Tomato is Mexican: tomates
  • Potato is from Chile and Peru: patatas

tomato potato corn

Diversity increases opportunities

Why computers?

for Molecular Biology and Genetics

Computers are rule changers

Modern computers were created to solve math equations

Then they were used to handle big databases

They became cheap and found everywhere

They became communication tools

They transformed society and science

How many computers do you use?

  • Cellphone
  • TV
  • Cable decoder
  • Microwave oven
  • Washing machine
  • Car motor
  • Metro
  • Elevator
  • Notebook

Computers transformed

  • the banking industry
  • the air travel industry
  • the manufacturing
  • the cars
  • the movies
  • Science

Four Paradigms of Science

1 Empiric

  • observation of isolated facts
  • description of related facts
  • e.g. Botany

2 Theoretical

  • Abstract models and theories
  • Usually expressed in mathematical formulas
  • Correct predictions validate the models
  • e.g. Mendel laws of inheritance

Four Paradigms of Science

3 Simulation Based

  • Models that cannot be expressed in formulas
  • Formulas that cannot be solved
  • e.g. Protein structure prediction

4 Data Based

  • Discovering patterns hidden in data
  • Huge volumes of data
  • Complex interactions
  • e.g. Bioinformatics

Computers

What does Computer means?

A computer is a counter

Normally was a person that did calculations

Sometimes with the help of mechanical devices

During the 2nd World War people invented electronic computers

So, computers are devices handling numbers

A Computer

“but I don’t use numbers …”

Don’t worry

Using numbers we can represent other things

In my country kids change vowels A, E, I, O, U by the numbers 1, 2, 3, 4, 5

Then they say H2LL4 (they are just kids)

Using the same idea we can represent any text

Notice that we have represented sounds by signs for centuries

Numbers can represent other things

There are three things in the Universe

  • Matter
  • Energy
  • Information

Information can be put in digital (numeric) form

Numbers can represent a lot of things

  • Images
  • Audio
  • Movies

not yet

  • smell
  • taste
  • tact

What can a modern computer do?

Computers handle numbers

Numbers represent information

Computers can transform and transfer information

So, What is a Computer

Computer
(English) counter, calculator
Ordinateur
(French) sorter, gives order to data, handles data
Bilgisayar
(Turkish) Information/Data counter

What do you do with a computer?

Do you have a computer at home?

What do you use it for?

What can a computer do?

  • calculate formulas
  • solve (some) equations
  • store and retrieve huge quantities of data
  • find patterns in data
  • find data matching a pattern
  • transform data in useful ways
  • compress data
  • move data at low cost without distortion

Let’s play “computer”

Solving an equation

First usage of electronic computers was to solve complex equations

This approach enabled landing on the moon

Let’s find the value \(x\) that satisfies \[24x^3-70x^2+19x+15=0\]

Naming the formula

Let us put a name to the formula. Let’s call it \(f(x)\). \[f(x) = 24x^3-70x^2+19x+15\]

We want to find \(x\) that makes \(f(x)=0.\) We can write \[f(x) = (24x^2-70x+19)x+15\] or even \[f(x) = ((24x-70)x+19)x+15\]

Computing f(x) given x

  • Take a piece of paper and write \(x\) in the first line
  • Write 24
  • Multiply the last two numbers
  • Add -70
  • Write \(x\) (from the first line)
  • Multiply the last two numbers
  • Add 19
  • Write \(x\) (from the first line)
  • Multiply the last two numbers
  • Add 15
  • Compare to 0

What happened?

We solved a complex mathematical question using a simple set of rules

  • write
  • multiply
  • add
  • compare

This decomposition in simple steps is called a program

Parts of a computer

In this exercise we used

  • memory (paper)
  • arithmetic/logic units (you: adding, multiplying, deciding)
  • input/output (me)

Programs

Many different questions can be solved with the same rules

It is just a matter of changing the program

First electromecanic computers were like us: A sequence of devices, each one feeding the next

Changing the program required physical change of pieces

Stored program

The key step

John Von Neuman realized that the set of steps can be also stored in memory (coded as numbers, obviously)

We only need to include

  • a pointer to the current instruction
  • a system to decide which arithmetic/logic rule apply

This is called Central Process Unit (CPU)

Hardware and Software

Since old times physical tools are called hardware

That includes al the physical parts of the computer (what you can kick)

Programs determine the function of the computer, but they are not “physical”.

That is software (what you can only insult)

Biological analogy

All cell components are hardware

The sequence of the DNA is the software

In summary

What is a computer?

Is a general purpose device that can

  • read, process and write numbers
    • (and things that can be represented by numbers)
    • to and from the memory
  • following a program stored also in the memory
    • many simple steps

Changing the program changes the purpose of the machine

In the Next Chapter

we will see …

  • How information is coded in numbers
  • How these numbers are stored and organized
  • How we interact with computers
  • Start using an specific tool: RStudio

Homework

  • Prepare a presentation about NCBI
  • Install R and RStudio
  • Register in the Google Group

Memory and Files

Computers have bad memory

When they don’t have energy, they forget all

All data must be stored in secondary memory

Today secondary memory is

  • hard disk
  • USB stick
  • Cloud storage

Structure of secondary memory

The disks store a huge amount of data

To organize it we use files

To organize the files we use folders also called directories

Files

Like the main memory, a file is just a list of bytes

The meaning of the file depends on the context

Most of the times, the name of the file suggests a context

File attributes

Besides the data itself, files have metadata

That is, data about the data. For example

  • Files have a name
  • Files have a modification date, maybe other dates too
  • Files have a size
  • Files have permissions

File names

The names of the files are “words”: a serie of letters, numbers and some symbols

Technically, a filenames is a String or list of characters

Maximum length of a filename is 250 characters

Avoid /, :, +, |, <, *, > quotes

Use letters (A-Z, a-z), numbers (0-9), ., -, _,

File names

In some systems small caps and BIG CAPS are not equivalent. Be systematic and coherent

If the filename includes ., the text after it is called extension

In Microsoft Windows (c) extensions are usually 3 letters

  • EXE, JPG, DOC, XLS, TXT, CSV
  • These are hints on how to interpret the file

Kinds of file

At low level there is only one type of file

For us, it is useful to separate in two:

Text Files
each byte is a character, we can read it
Binary Files
bytes are grouped in binary numbers, representing images, sounds, etc.

Among binary files we have EXE files, which are programs for Windows