Class 11: Indices

Computing in Molecular Biology and Genetics 1

Andrés Aravena, PhD

9 November 2020

Big picture

  • We can write structured documents
    • We can compile them into HTML, PDF, etc.
  • We can combine R and Markdown in the same document
  • We can handle numbers, text, and logic vectors
  • We store them in variables

Next Steps

In the rest of this course we will use real data

We will read data from files into R

We will analyze it and produce nice plots

But we need two final pieces of the puzzle

The missing pieces

In the last classes we talked about vectors

  • Numeric, logic and character vectors

We use vectors for two tasks

  • They will be used to contain real data
  • They allow us to look inside that data

in other words, we use them as Indices

Indices

The hardest concept in this course

Vectors have several elements

We want to read, write, and modify some elements of a vector, independent of the other elements of the vector

This is the fundamental idea of this course

This is where some people gets confused

Indices

Used to examine (and change) parts of a vector

To get the i-th element of a vector x we use x[i]

The value inside [] is called index (plural: indices)

x[i] corresponds to the math expression 𝑥𝑖

We read it as “x sub i”

We will work with this vector

Let’s imagine that we take the weight of some people

We store them in a vector, called weight

weight <- c(60, 72, 57, 90, 95, 72, 86, 65, 79)
weight
[1] 60 72 57 90 95 72 86 65 79

Accessing single elements

To get the i-th element of weight we use weight[i]

weight
[1] 60 72 57 90 95 72 86 65 79
weight[3]
[1] 57

Here the index i is a positive number

Indices are always inside square brackets []

Printing a vector shows brackets

seq(from=10, by=10, length.out=100)
  [1]   10   20   30   40   50   60   70   80   90  100  110  120
 [13]  130  140  150  160  170  180  190  200  210  220  230  240
 [25]  250  260  270  280  290  300  310  320  330  340  350  360
 [37]  370  380  390  400  410  420  430  440  450  460  470  480
 [49]  490  500  510  520  530  540  550  560  570  580  590  600
 [61]  610  620  630  640  650  660  670  680  690  700  710  720
 [73]  730  740  750  760  770  780  790  800  810  820  830  840
 [85]  850  860  870  880  890  900  910  920  930  940  950  960
 [97]  970  980  990 1000

Brackets [] show the each row first element’s index

The index can be a numeric vector

weight
[1] 60 72 57 90 95 72 86 65 79

We can create a vector using c() and use as index

weight[c(1, 3, 5)]
[1] 60 57 95

Most time we use a sequence, made with seq() or :

weight[2:4]
[1] 72 57 90

Beginning of a vector

weight
[1] 60 72 57 90 95 72 86 65 79

The first element is

weight[1]
[1] 60

The first 3 elements are

weight[1:3]
[1] 60 72 57

End of a vector

What is the last element of a vector?

weight
[1] 60 72 57 90 95 72 86 65 79

We need to know the vector length

length(weight)
[1] 9
weight[ length(weight) ]
[1] 79

Changing part of a vector

Many times we will need to change only part of a vector

weight
[1] 60 72 57 90 95 72 86 65 79

Let’s change the 4th value

weight[ 4 ] <- 88
weight
[1] 60 72 57 88 95 72 86 65 79

(maybe the person is on diet)

We can use Negative Indices

Negative indices say what not to show

weight
[1] 60 72 57 88 95 72 86 65 79

Omit the 1st, 3rd and 5th element

weight[c(-1, -3, -5)]
[1] 72 88 72 86 65 79

Useful when you need almost all elements

Example

weight
[1] 60 72 57 88 95 72 86 65 79

All except the first

weight[ -1 ]
[1] 72 57 88 95 72 86 65 79

All except the last

weight[ -length(weight) ]
[1] 60 72 57 88 95 72 86 65

Using logical vectors as indices

The most common usage of logic vectors is to filter the elements that match a condition

For example, values of vector weight that are greater than 75 can be found using this code

weight[ weight>75 ]
[1] 88 95 86 79

Step by step

Initial vector:

weight
[1] 60 72 57 88 95 72 86 65 79

Logical vector from comparison:

weight>75
[1] FALSE FALSE FALSE  TRUE  TRUE FALSE  TRUE FALSE  TRUE

Using the logical vector as index:

weight[ weight>75 ]
[1] 88 95 86 79

Both vectors must have the same length

We can use any logical vector as index for another vector

Verify that both vectors must have the same length,
or that the logic vector has length 1.

length(weight)
[1] 9
length(weight>75)
[1] 9

Named vectors

Who is that person?

Vector weight represents the weight of several people

weight
[1] 60 72 57 88 95 72 86 65 79

It is hard to know who is each person.
We need another vector

people <-c("Ali", "Beyza", "Cem", "Deniz", "Emre", "Fatma",
           "Murat", "Volkan", "Onur")

We should handle both vectors at the same time.
We better put them together

Give a name to each element

Every element of a vector can have its own name

weight
[1] 60 72 57 88 95 72 86 65 79
names(weight) <- people
weight
   Ali  Beyza    Cem  Deniz   Emre  Fatma  Murat Volkan   Onur 
    60     72     57     88     95     72     86     65     79 

Now it is easier to understand who is who

How to know if a vector has names

We can use names() to get the character vector of element’s names

names(weight)
[1] "Ali"    "Beyza"  "Cem"    "Deniz"  "Emre"   "Fatma"  "Murat" 
[8] "Volkan" "Onur"  

We use names() to see names and to assign names

Creating a named vector

We can assign elements’ names when we create the vector

height <- c(Ali=1.75, Beyza=1.64, Cem=1.65, Deniz=1.80, Emre=1.90,
            Fatma=1.65, Murat=1.78, Volkan=1.74, Onur=1.91)
height
   Ali  Beyza    Cem  Deniz   Emre  Fatma  Murat Volkan   Onur 
  1.75   1.64   1.65   1.80   1.90   1.65   1.78   1.74   1.91 

It is easy to identify each person

Notice that there are no [] on each line as before

Use one vector as filter for another vector

What is the height of people with weight over 75?

weight
   Ali  Beyza    Cem  Deniz   Emre  Fatma  Murat Volkan   Onur 
    60     72     57     88     95     72     86     65     79 
height
   Ali  Beyza    Cem  Deniz   Emre  Fatma  Murat Volkan   Onur 
  1.75   1.64   1.65   1.80   1.90   1.65   1.78   1.74   1.91 
height[ weight> 75 ]
Deniz  Emre Murat  Onur 
 1.80  1.90  1.78  1.91 

Names as Indices

If a vector has names, we can use them as indices:

weight[ "Murat" ]
Murat 
   86 

We can also use a vector of names, in any order

weight[ c("Deniz", "Volkan", "Fatma") ]
 Deniz Volkan  Fatma 
    88     65     72 

This allows us to print elements in the order we need

Using names to change elements

weight
   Ali  Beyza    Cem  Deniz   Emre  Fatma  Murat Volkan   Onur 
    60     72     57     88     95     72     86     65     79 

We change one value and we see the result

weight[ "Murat" ] <- 84
weight
   Ali  Beyza    Cem  Deniz   Emre  Fatma  Murat Volkan   Onur 
    60     72     57     88     95     72     84     65     79 

You can update many elements

weight
   Ali  Beyza    Cem  Deniz   Emre  Fatma  Murat Volkan   Onur 
    60     72     57     88     95     72     84     65     79 

Let’s increase some people by 10%

weight[c("Volkan", "Fatma")] <- 1.1 * weight[c("Volkan", "Fatma")]
weight
   Ali  Beyza    Cem  Deniz   Emre  Fatma  Murat Volkan   Onur 
  60.0   72.0   57.0   88.0   95.0   79.2   84.0   71.5   79.0 

In summary

  • Indices allow us to see and modify parts of a vector
  • Indices can be
    • positive integer vectors
    • negative integer vectors
    • logic vectors
    • character vectors
  • Index vectors can be of length 1 or longer
    • except logic indices, which have to be of the same size as the original vector