Welcome back

to “Computing for Molecular Biology 1”

Learning a new Language

A Computer Language

  • Programs are sets of instructions for the computer
  • We write them in a high level Language
    • humans can read it easily
    • the text file is called source code
    • it has to be transformed to machine code
      e.g. an EXE file
  • Two approaches to transform programs:
    • Compiler: all the source code is transformed to machine code at once
    • Interpreter: each line of code is transformed one by one

Basic Rules of a Language

Each phrase in a program is imperative.

Involves nouns, verbs and adverbs

Today we will focus on nouns

The only verb we need today is assign <-

Basic Objects

Nouns are names of objects.

They only exist as reference to objects.

The most simple objects in R are vectors

Vectors

  • All elements of the vector must have the same type
  • Basic types are
    • Character
    • Numeric
    • Factor
    • Logic

This order is important. Keep in mind

Factors

Also known as categorical variables.

They are used for discrete values, for example when there is no natural order

  • Color
  • Gender/Sex
  • Country of Origin

These are variables that you would never average

Creating vectors

  • Simple concatenation
 >  x <- c(1,2,3)
 >  y <- c(10,20)

These are two numeric vectors. We can concatenate them

 >  c(x, y, 5)
[1]  1  2  3 10 20  5

Notice that we use <- for assignment.

Creating vectors

Logical Vectors

 >  c(TRUE, TRUE, FALSE, TRUE)
[1]  TRUE  TRUE FALSE  TRUE

We can also write c(T,T,F,T)

A comparison creates a logical vector

 >  weight > 25
[1] FALSE FALSE FALSE FALSE  TRUE FALSE

Character vectors

Same idea. Concatenation

Each element must be between single or double quotes

> c("alpha", 'beta', "gamma")
[1] "alpha" "beta"  "gamma"
> c('he said "yes"', "I don't know")
[1] "he said \"yes\"" "I don't know"

Special characters are coded with two symbols:
\", \\, \n, \t

Factor vectors

Easy. Any character vector can be transformed into a factor

Sequences

 >  4:9
[1] 4 5 6 7 8 9
 >  seq(4,9)
[1] 4 5 6 7 8 9
 >  seq(4,10,2)
[1]  4  6  8 10
 >  seq(from=4, by=2, length=4)
[1]  4  6  8 10

Repetitions

 >  rep(1,3)
[1] 1 1 1
 >  rep(c(7,9,13), 3)
[1]  7  9 13  7  9 13  7  9 13
 >  rep(c(7,9,13), 1:3)
[1]  7  9  9 13 13 13

Repetitions

 >  rep(1:2,c(10,5))
[1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2
 >  rep(c(TRUE,FALSE),3)
[1]  TRUE FALSE  TRUE FALSE  TRUE FALSE
 >  rep(c(TRUE,FALSE),c(3,3))
[1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

Missing data

  • In practice there are cases when a datum is not present
  • It is not a good idea to use a fictitious value
  • The symbol NA is used in that case
  • You can use it on any vector, regardless of type
 >  c(NA,TRUE, FALSE)
[1]    NA  TRUE FALSE
 >  c(NA,1,2)
[1] NA  1  2

Mixing types inside a vector

  • In case of the mixture the values are transformed to the most generic type
 >  c(1, "tail")
[1] "1"     "tail"
 >  c(TRUE, "tail")
[1] "TRUE"  "tail"

Combining them

 >  c(2,TRUE, FALSE)
[1] 2 1 0
 >  c(factor(c("a","b")),"c")
[1] "1" "2" "c"

Names

  • Every element can have a name
 >  weight <- c(Peter=60, John=72, Frank=57, Huey=90, Dewey=95, Louie=72)
 >  weight
Peter  John Frank  Huey  Dewey  Louie
   60    72    57    90    95    72
 >  names(weight)
[1] "Peter" "John" "Frank" "Huey" "Dewey" "Louie"
 >  height <- c(1.75,1.80,1.65,1.90,1.74, 1.91)
 >  names(height) <- names(weight)

Accessing elements

  • To get the i-th element of a vector v we use v[i]
 >  weight[3]
Frank
   57
 >  weight[c(1,3,5)]
Peter Frank  Dewey
   60    57    95
 >  weight[2:4]
  • The index can be an array of integers

Negative Indices

  • Used to indicate omitted elements
> weight
Peter  John Frank  Huey  Dewey  Louie
   60    72    57    90    95    72
> weight[c(-1,-3,-5)]
John Huey Louie
  72   90   72
  • Useful when nearly every element used

Logical Indices

  • Can be indexed by a logical vector
  • Must be of the same length of the vector
 >  weight>72
Peter  John Frank  Huey  Dewey  Louie
FALSE FALSE FALSE  TRUE  TRUE FALSE
 >  weight[weight>72]
Huey Dewey
  90   95

Names as Indices

  • If a vector has names, we can use them:
 >  weight[c("Peter","John","Frank")]
Peter  John Frank
   60    72    57
  • How do we know if a vector has names?
names(vector)
is.null(names(weight))

Matrices

  • Like vectors but in 2 dimensions
 >  matrix(weight, nrow=2, ncol=3)
     [,1] [,2] [,3]
[1,]   60   57   95
[2,]   72   90   72
 >  matrix(weight, nrow=2, ncol=3, byrow=T)
     [,1] [,2] [,3]
[1,]   60   72   57
[2,]   90   95   72

Matrices

 >  M=matrix(weight, nrow=2, ncol=3)
 >  dim(M)
[1] 2 3
  • See also nrow(M) y ncol(M)
 >  colnames(M) <- c("A","B","C")
 >  rownames(M) <- c("x","y")
 >  M
   A  B  C
x 60 57 95
y 72 90 72

Arrays

  • Like matrices but with more dimensions
 >  A=array(0, dim=c(2,3,2))
 >  A
, , 1
     [,1] [,2] [,3]
[1,]    0    0    0
[2,]    0    0    0

, , 2
     [,1] [,2] [,3]
[1,]    0    0    0
[2,]    0    0    0

Indexing Matrices

  • Objects of type matrix or array use an index for each dimension
  • If an index is omitted, all the range is returned
 >  M[2,]
 A  B  C
72 90 72
 >  M[,3]
 x  y
95 72
 >  M[,2:3]
   B  C
x 57 95
y 90 72

Lists

  • Like vectores, but mixing different kinds of elements
people <- list(weight=c(60,72,57,90,95, 72),
height=c(1.75,1.80,1.65,1.90,1.74, 1.91),
names=c("Peter","John","Frank","Huey","Dewey", "Louie"),
valid=TRUE,
gender=factor(rep("M",6),levels=c("M","F")))

Lists

> people
$weight
[1] 60 72 57 90 95 72

$height
[1] 1.75 1.80 1.65 1.90 1.74 1.91

$names
[1] "Peter" "John"  "Frank" "Huey"  "Dewey"  "Louie"

$valid
[1] TRUE

$gender
[1] M M M M M M
Levels: F M

Indexing Lists

  • Can be indexed same as vectors
  • Returns a sub-list
>  people[1:2]
$weight
[1] 60 72 57 90 95 72

$height
[1] 1.75 1.80 1.65 1.90 1.74 1.91

Elements of Lists

 >  people[1]
$weight
[1] 60 72 57 90 95 72
  • This is a sublist
 >  people[[1]]
[1] 60 72 57 90 95 72
  • This is an element
  • Equivalent to people[["weight"]]
  • Also equivalent to people$weight

Data Frames

  • Bidimensional, similar to matrices
  • Each column can be of a different type
 >  ppl <- data.frame(weight=c(60,72,57,90,95, 72),
height=c(1.75,1.80,1.65,1.90,1.74, 1.91),
names=c("Peter","John","Frank","Huey","Dewey", "Louie"),
IMC=IMC,
gender=factor(rep("H",6),levels=c("H","M")))

Data Frame

> ppl
  weight height names      BMI gender
1   60   1.75   Peter 19.59184      M
2   72   1.80    John 22.22222      M
3   57   1.65   Frank 20.93664      M
4   90   1.90    Huey 24.93075      M
5   95   1.74   Dewey 31.37799      M
6   72   1.91   Louie 19.73630      M