October 8, 2019

About the homework

What is important, and what not

There are two ideas here:

  • Separate important from secondary things
    • Content is important
    • Structure is important
    • Style is not important
  • Tell to the computer what is the role of each part
    • Later you can decide the style
    • You can even change it later

Using R

Vectors

  • Group of values, all with the same type
  • Basic types are
    • Character
    • Numeric
    • Logic
    • Factor

Making your own vector

Vectors are so useful that we have several ways to create them

For example, the function c(3,1,4) will give us a vector

We can also make vectors with incrementing numbers or repeating numbers

Sequences

A vector with numbers between 4 and 9

4:9
[1] 4 5 6 7 8 9
seq(4, 9)
[1] 4 5 6 7 8 9

The two commands are equivalent. The first is easier to write

Sequences with more detail

We can go from 4 to 10 incrementing by 2

seq(4, 10, 2)
[1]  4  6  8 10

We can say how many numbers we want, instead of the last value

seq(from=4, by=2, length=4)
[1]  4  6  8 10

Repetitions

rep(1, 3)
[1] 1 1 1
rep(c(7, 9, 13), 3)
[1]  7  9 13  7  9 13  7  9 13
rep(c(7, 9, 13), c(2,1,3))
[1]  7  7  9 13 13 13

Repetitions

rep(1:2, c(10, 5))
 [1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2
rep(c(TRUE, FALSE), 3)
[1]  TRUE FALSE  TRUE FALSE  TRUE FALSE
rep(c(TRUE, FALSE), c(3, 3))
[1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

Missing data

Missing data

  • In practice there are cases when a datum is not present
  • It is not a good idea to use a fictitious value
  • The symbol NA is used in that case
  • You can use it on any vector, regardless of type
c(NA, TRUE, FALSE)
[1]    NA  TRUE FALSE
c(NA, 1, 2)
[1] NA  1  2

Missing data

  • We will use NA a lot
  • Literally means “not available”
  • Used when the experiment failed, or when the person did not answer the survey

Indices

The most important idea in this course

Vectors have several elements

  • We want to read, write, and modify some elements of a vector, independent of the other elements of the vector
  • We will do the same with other data structures in the following classes
  • This is the fundamental idea of this course
    • This is where some people gets confused

Accessing elements

To get the i-th element of a vector v we use v[i]

weight <- c(60, 72, 57, 90, 95, 72)
weight
[1] 60 72 57 90 95 72
weight[3]
[1] 57

The value inside [] is called index (plural: indices)

The index can be a numeric vector

weight[c(1,3,5)]
[1] 60 57 95
weight[2:4]
[1] 72 57 90

Negative Indices

Used to indicate omitted elements

weight
[1] 60 72 57 90 95 72
weight[c(-1,-3,-5)]
[1] 72 90 72

Useful when you need almost all elements

Logical Indices

Vectors can be indexed by a logical vector

Must be of the same length of the vector

weight>72
[1] FALSE FALSE FALSE  TRUE  TRUE FALSE
weight[weight>72]
[1] 90 95

This is very useful

Names of elements

Every element of a vector can have a name

names(weight) <- c("Ali", "Deniz", "Fatma", "Emre",
                   "Volkan", "Onur")
weight
   Ali  Deniz  Fatma   Emre Volkan   Onur 
    60     72     57     90     95     72 

Names of elements

We can assign names to the elements when we create the vector

height <- c(Ali=1.75, Deniz=1.80, Fatma=1.65, Emre=1.90,
            Volkan=1.74, Onur=1.91)
height
   Ali  Deniz  Fatma   Emre Volkan   Onur 
  1.75   1.80   1.65   1.90   1.74   1.91 

Names as Indices

If a vector has names, we can use them as indices:

weight[c("Deniz", "Volkan", "Fatma")]
 Deniz Volkan  Fatma 
    72     95     57 
  • How do we know if a vector has names?
names(vector)
NULL

In summary

  • Indices allow us to see and modify parts of a vector
  • Indices can be
    • positive integer vectors
    • negative integer vectors
    • logic vectors
    • character vectors
  • Index vectors can be of length 1 or longer
    • except logic indices, which have to be of the same size as the original vector