Welcome back

to “Computing for Molecular Biology 1”

What have we learned?

Tell me about

  • what is a computer?
  • what are the parts of a computer?
  • what is software?
  • how do computers handle information?

Memory and Files

Computers have bad memory

When they don’t have energy, they forget all

All data must be stored in secondary memory

Today secondary memory is

  • hard disk
  • USB stick
  • Cloud storage

Structure of secondary memory

The disks store a huge amount of data

To organize it we use files

To organize the files we use folders
also called directories

Files

Like the main memory, a file is just a list of bytes

The meaning of the file depends on the context

Most of the times, the name of the file suggests a context

File attributes

Besides the data itself, files have metadata

That is, data about the data. For example

  • Files have a name
  • Files have a modification date, maybe other dates too
  • Files have a size
  • Files have permissions

File names

The names of the files are “words”: a serie of letters, numbers and some symbols

Technically, a filenames is a String or list of characters

Maximum length of a filename is 250 characters

Avoid “/”, “:”, “+”, “|”, “<”, "*“,”>" quotes

Use letters (A-Z, a-z), numbers (0-9), “.”, “-”, "_“,” "

File names

In some systems small caps and BIG CAPS are not equivalent.
Be systematic and coherent

If the filename includes “.”, the text after it is called extension

In Microsoft Windows (c) extensions are usually 3 letters

  • EXE, JPG, DOC, XLS, TXT, CSV
  • These are hints on how to interpret the file

Kinds of file

At low level there is only one type of file

For us, it is useful to separate in two:

Text Files:
each byte is a character, we can read it
Binary Files:
bytes are grouped in binary numbers, representing images, sounds, etc.

Among binary files we have EXE files, which are programs for Windows

Folders/Directories

Directories

When disks became big, people could put thousands of files on them

But then finding the files became an issue

Directories came as a solution.

At that time (70’s) people used “phone directories”: big books with the name and phone number of everybody

In the 80’s, with graphical screens, people drew folders instead of directories.

Directories

A directory is a set of files. A file belongs to a single directory

A directory also can contain sub-directories

In Windows we also have separated disks, labeled A:, B:, C:
but none uses A: or B: nowadays

Tree of directories

Directories are organized in a hierarchy

“Parent” directories contain “child” directories

There cannot be any “cycle”

The topmost folder is called root directory

In Windows, each disk has a different tree and different roots

Current Folder

Each program in the computer knows about at least one folder: current folder

These are the files that the program can “see” immediately

To see other files, our program has to indicate in which folder to find the file

Is like using given names and family names.
When you are at home, family names are implicit

A program can change its current directory

Full Filenames

When accessing a file X outside the current directory, we have to specify the folder of the file

There are two ways of doing that:

  • Absolute: list all folders from the root directory to the file X
  • Relative: list the folders from the current directory to the file X

The (absolute or relative) list of folders is called the path of the file. It is a string, where a / between each folder name. In Windows \\ also separates folder names.

Absolute Paths

An absolute path starts with the character /.

In Windows we may also start with the disk label as C:/

Example:

  • /home/user/data
  • /Documents and Settings/user/Desktop
  • C:/Program Files

Relative Paths

Easy. They do not start with /

Each directory knows his parent directory. It is called ..

If necessary the current directory is also known by .

Application

the real deal

Analyzing Data

for fun and profit

Many disciplines, including Molecular Biology and Genetics, have become more and more data driven.

Starting now, we will use RStudio, a free software for data analysis

Most users of R are molecular biologists, but it is also used by economists, psychologists and marketing specialists

How to use RStudio

You have to install R and RStudio in your computer

You have to execute RStudio. Then

  • We read data from one or more files
  • We transform this data according to a program we design
  • We write the results to new files

Command line

RStudio, as almost all serious programs, is controlled by the keyboard

The mouse can be used for some shortcuts, but the real deal is the keyboard

A goal of this course is to become comfortable with the keyboard

These tools are for people who read books and don’t watch TV