Welcome back

to “Computing for Molecular Biology 1”

Quiz

What have we learned?

  • Why we learn about computers?
  • What have we learned about computers?
  • What is R? What is RStudio?
  • Who invented R?

What have we learned?

  • Which will be the questions for the midterm exam?
  • How many hours each week do you study?
  • How do you study?

Basic objects in R

Vectors

  • What is a vector?

  • What basic types of data does a vector handle?

  • How do we create a vector?

  • How do we assign a vector?

  • How do we put names to a vector?

  • How do we index a vector?

Vectors

  • All elements of the vector must have the same type
  • Basic types are

    • Character

    • Numeric

    • Factor

    • Logic

Describe each type

Creating vectors

  • Simple concatenation
    • c(1,2,3)
    • c(TRUE, TRUE, FALSE, TRUE)
    • c("alpha", 'beta', "gamma")
  • A comparison creates a logical vector
    • weight > 25
  • Any character vector can be transformed into a factor
    • factor(c("head","tail"))

Sequences & Repetitions

  • Sequences
    • 4:9
    • seq(4,10,2)
    • seq(from=4, by=2, length=4)
  • Repetitions
    • rep(1,3)
    • rep(c(TRUE,FALSE),3)
    • rep(c(TRUE,FALSE),c(3,3))
  • Missing data
    • c(NA,TRUE, FALSE)

Exercise

  • How do we write " inside a character string?

  • How do we write a TAB?

  • How do we assign a value to a variable?

  • What characters can be used to name a variable?

Answers

Special characters are coded with two symbols: \", \\, \n, \t.

We use <- for assignment.

Variable names start with a letter, and are followd by a letter, number or dot.

Names

Every element can have a name

 >  weight <- c(Peter=60, John=72,
                Frank=57, Huey=90,
                Dewey=95, Louie=72)
 >  names(weight)
[1] "Peter" "John" "Frank" "Huey" "Dewey" "Louie"

 >  height <- c(1.75, 1.80, 1.65, 1.90,
                1.74, 1.91)
 >  names(height) <- names(weight)

Notice that in some cases the command can spread multiple lines

Accessing elements

  • To get the i-th element of a vector v we use v[i]
  • The index can be a vector of integers
 >  weight[3]
Frank
   57

 >  weight[c(1,3,5)]
Peter Frank  Dewey
   60    57    95

 >  weight[2:4]

Negative Indices

  • Used to indicate elements to omit
  • Useful when nearly every element used
> weight
Peter  John Frank  Huey  Dewey  Louie
   60    72    57    90    95    72
> weight[c(-1,-3,-5)]
John Huey Louie
  72   90   72

Logical Indices

  • Can be indexed by a logical vector
  • Must be of the same length of the vector
 >  weight>72
Peter  John Frank  Huey  Dewey  Louie
FALSE FALSE FALSE  TRUE  TRUE FALSE
 >  weight[weight>72]
Huey Dewey
  90   95

Names as Indices

  • If a vector has names, we can use them as indices:
 >  weight[c("Peter","John","Frank")]
Peter  John Frank
   60    72    57
  • What is the difference between
    • a[1]
    • a["1"] ?

Matrices

  • Like vectors but in 2 dimensions
 >  matrix(weight, nrow=2, ncol=3)
     [,1] [,2] [,3]
[1,]   60   57   95
[2,]   72   90   72
 >  matrix(weight, nrow=2, ncol=3, byrow=T)
     [,1] [,2] [,3]
[1,]   60   72   57
[2,]   90   95   72

Matrices

 >  M <- matrix(weight, nrow=2, ncol=3)
 >  dim(M)
[1] 2 3
  • See also nrow(M) y ncol(M)

Columns and Rows Names

 >  colnames(M) <- c("A","B","C")
 >  rownames(M) <- c("x","y")
 >  M
   A  B  C
x 60 57 95
y 72 90 72

Arrays

  • Like matrices but with more dimensions
 >  A <- array(0, dim=c(2,3,2))
 >  A
, , 1
     [,1] [,2] [,3]
[1,]    0    0    0
[2,]    0    0    0

, , 2
     [,1] [,2] [,3]
[1,]    0    0    0
[2,]    0    0    0

Indexing Matrices

  • Objetos of type matrix or array use an index for each dimension
  • Example: M[1,1], M["x","A"]
  • If an index is omitted, all the range is returned
 >  M[2,]
 A  B  C
72 90 72
 >  M[,3]
 x  y
95 72

Indexing Matrices

Notice that sometimes the answer is a vector, other times is a matrix

 >  M[,2:3]
   B  C
x 57 95
y 90 72

Exercise

  • What would be the result of this command?
M <- outer(11:22, 11:17)
  • Try it

Result

      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
 [1,]  121  132  143  154  165  176  187
 [2,]  132  144  156  168  180  192  204
 [3,]  143  156  169  182  195  208  221
 [4,]  154  168  182  196  210  224  238
 [5,]  165  180  195  210  225  240  255
 [6,]  176  192  208  224  240  256  272
 [7,]  187  204  221  238  255  272  289
 [8,]  198  216  234  252  270  288  306
 [9,]  209  228  247  266  285  304  323
[10,]  220  240  260  280  300  320  340
[11,]  231  252  273  294  315  336  357
[12,]  242  264  286  308  330  352  374

Exercises

  • Find the value of the third row, fourth column
  • Get a vector with the fifth column
  • What is the difference between the first and the second row
  • What is the value at the center of the matrix?
    • change it to 0
  • Change the names of the columns to day names
  • Change the names of rows to month names
    • Hint: month.abb

Data Frames

  • Bi-dimensional, similar to matrices
  • Each column can be of a different type
 >  ppl <- data.frame(
    weight=c(60, 72, 57, 90, 95, 72),
    height=c(1.75, 1.80, 1.65, 1.90,
             1.74, 1.91),
    names=c("Peter", "John", "Frank",
            "Huey", "Dewey", "Louie"),
    gender=factor(rep("M",6),
             levels=c("F","M")))

Data Frame

> ppl
  weight height names gender
1   60   1.75   Peter      M
2   72   1.80    John      M
3   57   1.65   Frank      M
4   90   1.90    Huey      M
5   95   1.74   Dewey      M
6   72   1.91   Louie      M

Connecting with the real world

  • Data frames are the natural way to read data from files
    • and to write data to files
  • Look for the documentation of read.table()

  • Read the file birth.txt

Lists

  • Like vectores, but mixing different kinds of elements
people <- list(
    c(60,72,57,90,95, 72),
    c(1.75,1.80,1.65,1.90,1.74, 1.91),
    c("Peter", "John", "Frank",
      "Huey", "Dewey", "Louie"),
    TRUE,
    factor(rep("M",6),
          levels=c("M","F")))
  • Notice that elements can have different length

Result

[[1]]
[1] 60 72 57 90 95 72

[[2]]
[1] 1.75 1.80 1.65 1.90 1.74 1.91

[[3]]
[1] "Peter" "John"  "Frank" "Huey"  "Dewey" "Louie"

[[4]]
[1] TRUE

[[5]]
[1] M M M M M M
Levels: M F

Lists with Names

people <- list(
    weight=c(60,72,57,90,95, 72),
    height=c(1.75,1.80,1.65,1.90,1.74, 1.91),
    names=c("Peter", "John", "Frank",
            "Huey", "Dewey", "Louie"),
    valid=TRUE,
    gender=factor(rep("M",6),
           levels=c("M","F")))
  • How else can we assign names?

Lists with Names

$weight
[1] 60 72 57 90 95 72

$height
[1] 1.75 1.80 1.65 1.90 1.74 1.91

$names
[1] "Peter" "John"  "Frank" "Huey"  "Dewey"  "Louie"

$valid
[1] TRUE

$gender
[1] M M M M M M
Levels: F M

Indexing Lists

  • Can be indexed same as vectors
  • Returns a sub-list
>  people[1:2]
$weight
[1] 60 72 57 90 95 72

$height
[1] 1.75 1.80 1.65 1.90 1.74 1.91

Elements of Lists

 >  people[1]
$weight
[1] 60 72 57 90 95 72
  • It is a sublist
 >  people[[1]]
[1] 60 72 57 90 95 72
  • It is an element
  • Equivalent to people[["weight"]]
  • Also equivalent to people$weight

Indexing Lists

  • List elements are indexed by [[]]
  • sublists are indexed by []

Try these

people[[2]]
people[2]
people[[2]][3]
people[2][3]
people[[1:3]]
people[1:3]
people[["weight"]]
people$weight
people["weight"]

Exercise

Write a list with one element for each person, representing the name, weight, height and gender.