September 24th, 2019

What is a computer?

  • Is a general purpose device
  • that can read, process and write numbers
    • (and things that can be represented by numbers)
  • to and from the memory
  • following a program stored also in the memory
    • many simple steps
  • Changing the program changes the purpose of the machine

Hardware and Software

Since old times physical tools are called hardware

That includes al the physical parts of the computer (what you can kick)

Programs determine the function of the computer, but they are not “physical”.

That is software (what you can only insult)

Biological analogy

All cell components are hardware

The sequence of the DNA is the software

Parts of a computer

Processor(CPU) Interface(I/O) Memory(RAM) Secondary storage (Hard disk) Network

CPU

The processor or central processing unit (“CPU”) is the brains of the computer

  • does arithmetic,
  • moves data around,
  • controls the operation of the other parts
  • can decide what to do next based on the previous results

CPU can do only a few things, and it does them very fast

RAM: random access memory

The primary memory or random access memory

  • stores information that is in active use
    • the data that the CPU is currently working on,
    • the instructions that tell the CPU what to do
  • its contents can be changed by the CPU

RAM is volatile

  • Its contents disappear if the power is turned off
  • and all this currently active information is lost

That’s why it’s prudent to save your work often

Electric problems can be a real disaster

Your computer has a finite amount of RAM

You can think of the RAM as

  • a large collection of identical little boxes
  • numbered from 1 up to 1000000000
  • each box can hold a small amount of information.

Capacity is measured in bytes

What is the capacity of your computer?

Computers have bad memory

When they don’t have energy, they forget all

All data must be stored in secondary memory

Today secondary memory is

  • hard disk
  • USB stick
  • Cloud storage

Disks and secondary storage

  • The RAM is expensive, so we it is not too big
    • its contents disappear when the power is turned off
  • Secondary storage holds data even when the power is turned off
  • The most common kind are magnetic disks
    • also called the hard disk or hard drive
  • The disk stores much more information than RAM
  • data on the disk stays there indefinitely
    • even if power fails

Old disks were not “hard disks”

Floppy disks (70’s)

Micro-Floppy disk (90’s)

Secondary storage is slow

Data, instructions, and everything else is stored on the disk for the long term

And brought into RAM only for a short time

Disk space is about 100 times cheaper than RAM

But accessing information is much slower.

Homework: Memory size

How much can you store in your computer? Please answer these two questions:

  1. What is the capacity of the memory of your computer?

  2. What is the capacity of the disk?

Primary memory is like a desk

Secondary storage is like a bookshelf

Structure of secondary memory

The disks store a huge amount of data

To organize it we use files

To organize the files we use folders
also called directories

Files

Like the main memory, a file is just a list of bytes

The meaning of the file depends on the context

You can decide to change their meaning

Most of the times, the name of the file suggests a context

For example, an MP3 file is probably audio

File attributes

Besides the data itself, files have metadata

That is, data about the data. For example

  • Files have a name
  • Files have a modification date, maybe other dates too
  • Files have a size
  • Files have permissions

File names

The names of the files are “words”: a series of letters, numbers and some symbols

Technically, a filenames is a String or list of characters

Maximum length of a filename is 250 characters

Avoid /, :, +, |, <, *, >, " and '

Use letters (A-Z, a-z), numbers (0-9), ., -,   and _

File names

In some systems small caps and BIG CAPS are not equivalent. Be systematic and coherent

If the filename includes ., the text after it is called extension

In Microsoft Windows (c) extensions are usually 3 letters

  • EXE, JPG, DOC, XLS, TXT, CSV
  • These are hints on how to interpret the file

Kinds of file

At low level there is only one type of file

For us, it is useful to separate in two:

Text Files:
each byte is a character, we can read it
Binary Files:
bytes are grouped in binary numbers, representing images, sounds, etc.

Among binary files we have EXE files, which are programs for Windows

Representing text

The most natural way to represent a text document is to encode each letter with a single byte

There is a basic standard for English, called ASCII

Each number from 0 to 127 is either a symbol or a special signal

  • New Line
  • End of Message
  • Tab
  • Space
  • Backspace

ASCII code

30 40 50 60 70 80 90 100 110 120
0 ( 2 < F P Z d n x
1 ) 3 = G Q [ e o y
2 4 > H R \ f p z
3 ! + 5 ? I S ] g q {
4 " , 6 @ J T ^ h r |
5 # - 7 A K U i s }
6 $ . 8 B L V ` j t ~
7 % / 9 C M W a k u
8 & 0 : D N X b l v
9 ´ 1 ; E O Y c m w

Non-English languages use numbers between 128 and 255 for symbols like “Ç”, “Ö”, “É”, “Ñ”

Text Files

  • are universal
  • are easy to read and write from a program
  • do not have any style like bold or italic
  • are like books without figures

Microsoft Word files (doc or docx) are NOT text files

You shall not use Word for this course

Example binary file

Example text file

Text files are for humans and computers

Text files are for humans and computers

  • Binary files are hard to read
    • unless you have the correct program
  • Text files can be read by humans
    • Each byte is a letter
  • Text files can be read by computers
    • Data must be recyclable
    • The output of one program is the input of another program

How to really use computers

Are computers helping us?

Many people feel like “computers are not helping”

Instead they feel like computers make things harder

The same happened when electric engines were invented

First factories had a single steam engines

Energy was transported using belts

One motor, several machines

Later, electric motors replaced the steam engines

But the factories did not improve

The real change happened when each machine has its own motor

Today we have electric motors everywhere

Doing the same thing gives the same results

Just changing the technology does not change the world

The real change happens when we do things in a different way

What about this?

Computers are not Typewriters

If we only replace typewriters by Word Processors, nothing changes

Microsoft Word is a technology for 19th century

We need a new way to use computers

Using R and RStudio

How to use RStudio

You have to install R and RStudio in your computer

You have to execute RStudio. Then you will see a screen like this

Today we will focus only on one part

Click on File → New File → R Markdown

A text editor

You will get a new window with an example text

It is a text file. One character takes one byte

Colors are only a guide for you. They are not part of the text

Today we will learn how to write text files for our course

Structure

Markdown

An alternative to ordinary Word Processors is to use text files with a few rules to mark the role of each element.

Text files can be read with any computer, and will be accessible for ever.

Today the Structured Text format most often used is Markdown

Markdown rules

Paragraphs
Consecutive lines of text are one paragraph. They are separated by an empty line
Chapters, Sections, subsections
# Header 1, ## Header 2, ### Header 3
No space before #, one space after

Markdown rules

Unordered Lists
Use + or - in the first column, and one space after
Sublists are marked with 4 spaces on the left
Ordered Lists
Like unordered lists but with numbers
Use 1. on the left and separate with one space

Markdown rules

Important paragraph|Quotation
To show something important
Use > on the left side, and one spae after
Images
You have to indicate the web address of the image
![optional text](http://example.com/logo.png)

or the name of a file in the same directory
![optional text](images/logo.png)

Markdown rules

Computer code
use ``` before and after the block
exactly 3 ticks
This will be very important in the rest of the course

Markdown rules

Metadata
Data about the data: date, purpose, bibliography, etc.
This part uses a format called YAML
use --- (three hyphens) before and after
Header
Metadata at the beginning of the file
Uses YAML format
Declare title, author, and date

Example of Header Metadata

---
title: "Midterm Exam"
subtitle: "Computing in Molecular Biology 1"
author: "Put your name here"
number: STUDENT_NUMBER
date: "September 24, 2019"
output: html_document
---

ALWAYS write your name and student number

Rules for YAML files

  • Each list element starts in the first column. No spaces
  • The inner list elements are indented with 2 spaces
  • You can have lists inside lists inside lists…
  • Name and values are separated by :
  • Always space after :
  • --- before and after the YAML code

Google “YAML” for more info

Use YAML for bibliography

references:
- type: article-journal
  id: WatsonCrick1953
  title: 'Molecular structure of nucleic acids: a structure for
    deoxyribose nucleic acid'
  author:
  - family: Watson
    given: J. D.
  - family: Crick
    given: F. H. C.
  container-title: Nature
  volume: 171
  issue: 4356
  page: 737-738
  issued:
    date-parts:
    - - 1953
      - 4
      - 25

How to use it

Put all the references somewhere in the document, with --- before and after.

  • [@WatsonCrick1953] produces (Watson and Crick 1953)
  • [@WatsonCrick1953, pp. 33-35, 38-39] becomes (Watson and Crick 1953, 33–35, 38–39).
  • [@WatsonCrick1953; @Collado-Vides2009a] becomes (Watson and Crick 1953; Collado-Vides et al. 2009).
  • @WatsonCrick1953 [p. 33] says blah becomes Watson and Crick (1953, 33) says blah

External bibliographies

If you have a long list of all papers, and you use it on several documents, then you should put the references in a separate file

Then you write

bibliography: references.yml

in the document metadata

Other formats for references

Format File extension
BibLaTeX .bib
BibTeX .bibtex
Copac .copac
CSL JSON .json
CSL YAML .yaml
EndNote .enl
EndNote XML .xml
ISI .wos
MEDLINE .medline
MODS .mods
RIS .ris

Tools for managing the bibliography

It is good that RMarkdown uses all these formats

There are many tools to manage your paper collection

It is not enough to download PDF and store them in a folder. They need to be organized and have a structure

Two good and free programs are Mendeley and Zotero

Bibliography at the end of document

Bibliographies will be placed at the end of the document. Normally, you will want to end your document like this:

last paragraph...

# References

The bibliography will be inserted after this header. More info at

http://rmarkdown.rstudio.com/ authoring_bibliographies_and_citations.html

Format inside a paragraph

Links
clickable text.
Inline code
We can speak about x and data
Footnotes
find how to write them

Exercise: Write in Markdown

How to solve it

by G. Polya

  • You have to understand the problem.
  • Find the connection between the data and the question. You should obtain a plan of the solution.
  • Carry out your plan.
  • Examine the solution obtained.

More information

read and learn it

References

Collado-Vides, J, H Salgado, E Morett, S Gama-Castro, V Jiménez-Jacinto, I Martínez-Flores, A Medina-Rivera, L Muñiz-Rascado, M Peralta-Gil, and A Santos-Zavaleta. 2009. “Bioinformatics Resources for the Study of Gene Regulation in Bacteria.” Journal of Bacteriology 191 (1): 23–31.

Watson, J. D., and F. H. C. Crick. 1953. “Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid.” Nature 171 (4356): 737–38. https://doi.org/10.1038/171737a0.