Class 6: Using R and RStudio

Computing for Molecular Biology 1

Andrés Aravena, PhD

26 October 2020

Analyzing Data

for fun and science

Many disciplines, including Molecular Biology and Genetics, have become more and more data driven.

Starting today, we will use R, a free software for data analysis

Most users of R are molecular biologists, but it is also used by economists, psychologists and marketing specialists

R and RStudio

How to use RStudio

Install R and RStudio in your computer

You have to execute RStudio

Then you will see a screen like this…

Command line

RStudio, as almost all serious programs, is controlled by the keyboard

The mouse can be used for some shortcuts,
but the real deal is the keyboard

A goal of this course is to become comfortable with the keyboard

These tools are for people who read books and don’t watch TV

The keyboard: your real friend

This is the Turkish version

We use the keys `, ", {, }, [,], and Tab.

The keys in red are “dead keys”.

Backspace é " < ! 1 > 2 ^ 3 # + 4 $ % 5 & 6 / 7 { ( 8 [ ) 9 ] = 0 } ? * \ _ - Q @ W E R T Y U I O P Ğ ¨ Ü ~ Caps A S D F G H J K L Ş ´ İ Return Enter Shift Z X C V B N M Ö Ç : . Shift Ctrl Win Alt AltGr Win Ctrl Tab ; , `

Learn how to use the keyboard

  • We need to use `, ", {, }, [,], and Tab
  • We use ` a lot. Find it!
  • The keys in red are “dead keys”
    • They do not write until you press another key
    • You can use them to write foreign words
      • for example “El Niño”, “naïve”, “voilà”
    • Press AltGr+, first, and then SPACE to get the symbol `

Important key names

  • #: Hash. Used for comments
  • $: Dollar. Used for column names
  • { and }: Braces, curly brackets
  • [ and ]: Brackets, used for indices
  • `: Back tick. Used for code
  • ' and ": single quote and double quote. Used for text
  • / and \: slash and backslash

Talking with the computer

R version 4.0.2 (2020-06-22) -- "Taking Off Again"
Copyright (C) 2020 The R Foundation for Statistical Computing
[…]
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.

>

This > symbol is called prompt

You do not write the > part. This is a message from the computer to you

You write after the prompt

prompt [präm(p)t]

verb

  • Assist or encourage (a hesitating speaker) to say something: “What do you want?” he prompted.
  • Computing (of a computer) request input from a user.

From “New Oxford American Dictionary”

The meaning of >

(An interactive session)

  • The computer shows the prompt >
  • You write some commands using the keyboard
  • You finish by pressing Enter or Return
  • The computer executes your commands
  • When the execution finishes you get a new prompt

and repeat

Tab is your friend

In Rstudio you can press Tab and get superpowers!

  • The computer will propose alternatives depending on the context
  • You can select the good one using the arrows
  • If there is only one option then it is completed automatically
  • You write faster and make less mistakes

You can also repeat and edit previous commands using the arrows

You can delete all the line using Escape

Using R as a calculator

Writing Integer Numbers

Write the number after >. Do not write >

> 42
[1] 42

The grey part is what we write, the blue part is the computer’s response

How to write “one half”

Writing Numbers, with decimals

Most countries use , to decimal separator

0,5

In USA they use . as separate the integer and decimal parts

0.5

In theory you can use any of them, but it is easier to use dot .

Writing Numbers, large and small

Compare 520000000000 against 52000000000
Are they the same? Which one is bigger?

It is better to use exponential notation

52 × 1010 versus 52 × 109

In the computer we write powers of 10 as E

52E10 versus 52E9

E notation is scientific notation

  • 1 milli is 1E-3
  • 1 micro is 1E-6
  • 1 nano is 1E-9
  • 1 pico is 1E-12
  • 1 kilo is 1E3
  • 1 mega is 1E6
  • 1 giga is 1E9
  • 1 tera is 1E12

Exponential notation is unambiguous

There are different names for the same number, and different numbers for the same name

  • One million is 1E6
  • One milliard is 1E9 (thousand millions or short billion)
  • One billion is 1E12 (long billion or short trillion)

The short names are mostly used in USA, the long names are used in most other countries.

Basic arithmetic operations

> 2 + 3
[1] 5
> 2 * 3
[1] 6

 

sum

 

multiplication

Basic arithmetic operations

> 2 - 3
[1] -1
> 2 / 3
[1] 0.6666667
> 2 ^ 3
[1] 8

 

difference (minus)

 

division

 

exponentiation (power)

Operation’s priority

Order matters

> 2+3*4
[1] 14
> 2^3*4
[1] 32
> 4*2^3
[1] 32
> (2+3)*4
[1] 20
> 2^(3*4)
[1] 4096
> (4*2)^3
[1] 512

PEMDAS

“Parentheses, Exponents, Multiplication and Division, Addition and Subtraction”

  1. Parentheses (simplify what is inside)
  2. Exponents
  3. Multiplication and Division (from left to right)
  4. Addition and Subtraction (from left to right)

(Please Excuse My Dear Aunt Sally)

Evaluation goes left to right

Compare 2-3-4 v/s 2-(3-4)

> 2-3-4
[1] -5
> 2-(3-4)
[1] 3

Evaluation goes left to right

Compare 2/3/4 v/s 2/(3/4) \[\frac{\frac{2}{3}}{4}=\frac{2}{3}\cdot\frac{1}{4}=\frac{2}{12}\] \[\frac{2}{\frac{3}{4}}=\frac{2}{1}\cdot\frac{4}{3}=\frac{8}{3}\]

Say what you mean

Use the language correctly

This is important in computing, in science, and in life.

Functions

If we can calculate

> 10^2
[1] 100

How do we calculate \(\sqrt{100}\)?

> sqrt(100)
[1] 10

Functions

If we can calculate

> 10^2
[1] 100

How do we calculate \(\log_{10}(100)\)?

> log10(100)
[1] 2

Functions

  • log(): Logarithm

  • exp( ): exponential

  • abs( ): absolute value

  • sign( ): sign -1, 0 or 1

  • floor(x): Integer just below x

  • ceiling(x): Integer just after x

  • round(x): Integer closest to x