November 14th, 2016

Telling stories

Descriptive Statistics

We have data, we want to tell something about them

We use some functions to analyze our data

  • Counting them
  • Locating them
  • Describe them

Functions

Have a name, several inputs and one output

Inputs are always inside round parenthesis ( )

Some inputs can be optional. They have default values

You have to read the manual using

help(function_name)

or in short version

?function_name

Functions used to describe vectors

There are many. You have to explore and learn

  • length()
  • min(), max(), range()
  • head(), tail()
  • summary()
  • table()

Describing factor vectors

length(state.region)
[1] 50
summary(state.region)
    Northeast         South North Central          West 
            9            16            12            13 
table(state.region)
state.region
    Northeast         South North Central          West 
            9            16            12            13 

Describing character vectors

length(state.abb)
[1] 50
summary(state.abb)
   Length     Class      Mode 
       50 character character 
table(state.abb)
state.abb
AK AL AR AZ CA CO CT DE FL GA HI IA ID IL IN KS KY LA MA MD ME MI MN MO MS 
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 
MT NC ND NE NH NJ NM NV NY OH OK OR PA RI SC SD TN TX UT VA VT WA WI WV WY 
 1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1 

Describing numeric vectors

length(state.area)
[1] 50
summary(state.area)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   1214   37317   56222   72368   83234  589757 
table(state.area)
state.area
  1214   2057   5009   6450   7836   8257   9304   9609  10577  24181 
     1      1      1      1      1      1      1      1      1      1 
 31055  33215  36291  40395  40815  41222  42244  45333  47716  48523 
     1      1      1      1      1      1      1      1      1      1 
 49576  51609  52586  53104  56154  56290  56400  58216  58560  58876 
     1      1      1      1      1      1      1      1      1      1 
 68192  69686  69919  70665  77047  77227  82264  83557  84068  84916 
     1      1      1      1      1      1      1      1      1      1 
 96981  97914 104247 110540 113909 121666 147138 158693 267339 589757 
     1      1      1      1      1      1      1      1      1      1 

Data Visualization

“one image worths a thousand words”

Graphics

Sometimes the best way to tell the story of the data is with a graphic

plot(state.area)

Another

plot(rivers)

Each element has a position in the x axis

Plotting Factors

The previous graphics used numeric data. What about factors?

plot(state.region)

Barplots

  • Numeric vectors are shown element by element
  • Factor vectors are shown as a “table”
    • i.e. the frequency of each value
    • i.e. counting how many times for each value
  • Can we do the same for a numeric vector?
    • all values are different
    • we have to group them in “similar” sets

Histograms

plot(rivers)

hist(rivers)

Histograms

Numeric data is grouped in N classes

hist(rivers, col="grey")

hist(rivers, col="grey", nclass = 30)

Making it beautiful

Choosing the color

plot(rivers)

plot(rivers, col="red")

Choosing the size of the symbol

plot(rivers, cex=2)

plot(rivers, cex=0.5)

Choosing the symbol

plot(rivers, pch=16)

plot(rivers, pch=".")

Choosing the type of plot

plot(rivers, type = "l")

plot(rivers, type = "o")

Zooming

plot(rivers, type = "l", xlim=c(1,50))

plot(rivers, type = "o", xlim=c(100,141))

More plot types

plot(rivers, type = "b")

plot(rivers, type = "n")

Full annotation

plot(rivers, main = "Length of Rivers", sub = "141 samples", ylab="Length [miles]")

Two or more variables

Two plots in parallel

plot(state.x77[,"Area"], ylab="Area [sq mi]")
points(state.x77[,"Population"]*10, pch=2)

The first one defines the scale

Two lines in parallel

plot(state.x77[,"Area"], type="l", ylab="Area [sq mi]")
lines(state.x77[,"Population"]*10, col="red")