## A Computer Language

• Programs are sets of instructions for the computer
• We write them in a high level Language
• humans can read it easily
• the text file is called source code
• it has to be transformed to machine code
e.g. an EXE file
• Two approaches to transform programs:
• Compiler: all the source code is transformed to machine code at once
• Interpreter: each line of code is transformed one by one

## Basic Rules of a Language

Each phrase in a program is imperative.

Involves nouns, verbs and adverbs

Today we will focus on nouns

The only verb we need today is assign <-

## Basic Objects

Nouns are names of objects.

They only exist as reference to objects.

The most simple objects in R are vectors

## Vectors

• All elements of the vector must have the same type
• Basic types are
• Character
• Numeric
• Factor
• Logic

This order is important. Keep in mind

## Factors

Also known as categorical variables.

They are used for discrete values, for example when there is no natural order

• Color
• Gender/Sex
• Country of Origin

These are variables that you would never average

## Creating vectors

• Simple concatenation
 >  x <- c(1,2,3)
>  y <- c(10,20)

These are two numeric vectors. We can concatenate them

 >  c(x, y, 5)
[1]  1  2  3 10 20  5

Notice that we use <- for assignment.

## Creating vectors

Logical Vectors

 >  c(TRUE, TRUE, FALSE, TRUE)
[1]  TRUE  TRUE FALSE  TRUE

We can also write c(T,T,F,T)

A comparison creates a logical vector

 >  weight > 25
[1] FALSE FALSE FALSE FALSE  TRUE FALSE

## Character vectors

Same idea. Concatenation

Each element must be between single or double quotes

> c("alpha", 'beta', "gamma")
[1] "alpha" "beta"  "gamma"
> c('he said "yes"', "I don't know")
[1] "he said \"yes\"" "I don't know"


Special characters are coded with two symbols:
\", \\, \n, \t

## Factor vectors

Easy. Any character vector can be transformed into a factor

## Sequences

 >  4:9
[1] 4 5 6 7 8 9
>  seq(4,9)
[1] 4 5 6 7 8 9
>  seq(4,10,2)
[1]  4  6  8 10
>  seq(from=4, by=2, length=4)
[1]  4  6  8 10

## Repetitions

 >  rep(1,3)
[1] 1 1 1
>  rep(c(7,9,13), 3)
[1]  7  9 13  7  9 13  7  9 13
>  rep(c(7,9,13), 1:3)
[1]  7  9  9 13 13 13

## Repetitions

 >  rep(1:2,c(10,5))
[1] 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2
>  rep(c(TRUE,FALSE),3)
[1]  TRUE FALSE  TRUE FALSE  TRUE FALSE
>  rep(c(TRUE,FALSE),c(3,3))
[1]  TRUE  TRUE  TRUE FALSE FALSE FALSE

## Missing data

• In practice there are cases when a datum is not present
• It is not a good idea to use a fictitious value
• The symbol NA is used in that case
• You can use it on any vector, regardless of type
 >  c(NA,TRUE, FALSE)
[1]    NA  TRUE FALSE
>  c(NA,1,2)
[1] NA  1  2

## Mixing types inside a vector

• In case of the mixture the values are transformed to the most generic type
 >  c(1, "tail")
[1] "1"     "tail"
>  c(TRUE, "tail")
[1] "TRUE"  "tail"

## Combining them

 >  c(2,TRUE, FALSE)
[1] 2 1 0
>  c(factor(c("a","b")),"c")
[1] "1" "2" "c"

## Names

• Every element can have a name
 >  weight <- c(Peter=60, John=72, Frank=57, Huey=90, Dewey=95, Louie=72)
>  weight
Peter  John Frank  Huey  Dewey  Louie
60    72    57    90    95    72
>  names(weight)
[1] "Peter" "John" "Frank" "Huey" "Dewey" "Louie"
>  height <- c(1.75,1.80,1.65,1.90,1.74, 1.91)
>  names(height) <- names(weight)

## Accessing elements

• To get the i-th element of a vector v we use v[i]
 >  weight[3]
Frank
57
>  weight[c(1,3,5)]
Peter Frank  Dewey
60    57    95
>  weight[2:4]
• The index can be an array of integers

## Negative Indices

• Used to indicate omitted elements
> weight
Peter  John Frank  Huey  Dewey  Louie
60    72    57    90    95    72
> weight[c(-1,-3,-5)]
John Huey Louie
72   90   72
• Useful when nearly every element used

## Logical Indices

• Can be indexed by a logical vector
• Must be of the same length of the vector
 >  weight>72
Peter  John Frank  Huey  Dewey  Louie
FALSE FALSE FALSE  TRUE  TRUE FALSE
>  weight[weight>72]
Huey Dewey
90   95

## Names as Indices

• If a vector has names, we can use them:
 >  weight[c("Peter","John","Frank")]
Peter  John Frank
60    72    57
• How do we know if a vector has names?
names(vector)
is.null(names(weight))

## Matrices

• Like vectors but in 2 dimensions
 >  matrix(weight, nrow=2, ncol=3)
[,1] [,2] [,3]
[1,]   60   57   95
[2,]   72   90   72
>  matrix(weight, nrow=2, ncol=3, byrow=T)
[,1] [,2] [,3]
[1,]   60   72   57
[2,]   90   95   72

## Matrices

 >  M=matrix(weight, nrow=2, ncol=3)
>  dim(M)
[1] 2 3
• See also nrow(M) y ncol(M)
 >  colnames(M) <- c("A","B","C")
>  rownames(M) <- c("x","y")
>  M
A  B  C
x 60 57 95
y 72 90 72

## Arrays

• Like matrices but with more dimensions
 >  A=array(0, dim=c(2,3,2))
>  A
, , 1
[,1] [,2] [,3]
[1,]    0    0    0
[2,]    0    0    0

, , 2
[,1] [,2] [,3]
[1,]    0    0    0
[2,]    0    0    0

## Indexing Matrices

• Objects of type matrix or array use an index for each dimension
• If an index is omitted, all the range is returned
 >  M[2,]
A  B  C
72 90 72
>  M[,3]
x  y
95 72
>  M[,2:3]
B  C
x 57 95
y 90 72

## Lists

• Like vectores, but mixing different kinds of elements
people <- list(weight=c(60,72,57,90,95, 72),
height=c(1.75,1.80,1.65,1.90,1.74, 1.91),
names=c("Peter","John","Frank","Huey","Dewey", "Louie"),
valid=TRUE,
gender=factor(rep("M",6),levels=c("M","F")))

## Lists

> people
$weight [1] 60 72 57 90 95 72$height
[1] 1.75 1.80 1.65 1.90 1.74 1.91

$names [1] "Peter" "John" "Frank" "Huey" "Dewey" "Louie"$valid
[1] TRUE

$gender [1] M M M M M M Levels: F M ## Indexing Lists • Can be indexed same as vectors • Returns a sub-list > people[1:2]$weight
[1] 60 72 57 90 95 72

$height [1] 1.75 1.80 1.65 1.90 1.74 1.91 ## Elements of Lists  > people[1]$weight
[1] 60 72 57 90 95 72
• This is a sublist
 >  people[[1]]
[1] 60 72 57 90 95 72
• This is an element
• Equivalent to people[["weight"]]
• Also equivalent to people\$weight

## Data Frames

• Bidimensional, similar to matrices
• Each column can be of a different type
 >  ppl <- data.frame(weight=c(60,72,57,90,95, 72),
height=c(1.75,1.80,1.65,1.90,1.74, 1.91),
names=c("Peter","John","Frank","Huey","Dewey", "Louie"),
IMC=IMC,
gender=factor(rep("H",6),levels=c("H","M")))

## Data Frame

> ppl
weight height names      BMI gender
1   60   1.75   Peter 19.59184      M
2   72   1.80    John 22.22222      M
3   57   1.65   Frank 20.93664      M
4   90   1.90    Huey 24.93075      M
5   95   1.74   Dewey 31.37799      M
6   72   1.91   Louie 19.73630      M