Class 26: Drawing beautiful plots

Computing for Molecular Biology 1

Andrés Aravena, PhD

11 January 2021

Grammar of Graphics

Grammar

Grammar is:

How to speak and make sense

For example, if we say

make how speak and to sense

we have the same words, but they do not make sense

English grammar

In English we say

John eats fruit

  • There is a subject: “John”
  • There is a verb: “eats”
    • There may be an adverb: “slowly”
  • There is a predicate: “fruit”
    • There may be an adjective: “fresh”

John slowly eats fresh fruit

Grammar of graphics

  • what is the data
  • which columns map to each graphic attributes
  • what statistical transformation are needed
  • which geometric object will be draw
  • what last-minute position adjustment is needed

There are many explanations on the web. Google them

We use ggplot2

There was an old “ggplot” package. We use the second version.

library(ggplot2)

This package gives us many commands. The basic ones are

  • ggplot()
  • geom_point(), geom_line(), and other geom_… commands
  • Several scale_x_…, scale_y_…, and other scale_… commands
  • theme

First plot

ggplot(students, aes(x=height_cm, y=weight_kg)) + geom_point()
Warning: Removed 18 rows containing missing values (geom_point).

Parts of this command

ggplot(students, aes(x=height_cm, y=weight_kg)) + geom_point()
  • We start with ggplot(). It takes two inputs
    • a data frame, in this case students
    • an aes() object, called aesthetic map
  • aes() describe how data frame columns are mapped to visual properties (aesthetics)
    • in this case we chose the x= and y= parts
  • Finally, we add the geometry to draw the points
    • In this case we use geom_point()

Aesthetics: how to show each column

ggplot(students, aes(x=height_cm, y=weight_kg, color=sex)) + geom_point()

Point size can show some data

ggplot(students, aes(x=height_cm, y=weight_kg, size=hand_span)) +
    geom_point()

Transparent points using alpha

ggplot(students, aes(x=height_cm, y=weight_kg, size=hand_span, alpha=0.5)) +
    geom_point()

We can put aesthetics in geom_point()

ggplot(students, aes(x=height_cm, y=weight_kg, size=hand_span)) +
    geom_point(alpha=0.5)

Assigning multiple aesthetics

ggplot(students, aes(x=height_cm, y=weight_kg, size=hand_span,
                     color=sex)) + geom_point()

Some shapes have color and fill

ggplot(students, aes(x=height_cm, y=weight_kg,
                        size=hand_span, fill=sex)) + 
  geom_point(alpha=0.5, shape="circle filled", color="black")

Boxplot

ggplot(students, aes(x=sex, y=weight_kg)) + 
    geom_boxplot()

Boxplot with color

ggplot(students, aes(x=sex, y=weight_kg, colour=sex)) + 
    geom_boxplot()

Boxplot with fill

ggplot(students, aes(x=sex, y=weight_kg, fill=sex)) + 
    geom_boxplot()

Plotting a single column

Barplot counts values in one column

ggplot(students, aes(x=sex, fill=sex)) + 
    geom_bar()

Histogram for numeric variables

ggplot(students, aes(x=height_cm)) + geom_histogram(color="black") 

Line Plots

COVID-19 cases depend on time

Turkey %>%
    ggplot(aes(x=Date_reported, y=New_cases)) + geom_line()

Drawing columns

Turkey %>%
    ggplot(aes(x=Date_reported, y=New_cases)) + geom_col()

Focusing on April

Turkey %>%
    filter(Date_reported>"2020-03-31", Date_reported< "2020-05-01") %>% 
    ggplot(aes(x=Date_reported, y=New_cases)) + geom_col()

Drawing all European Countries

covid %>% filter(WHO_region=="EURO") %>%
    ggplot(aes(x=Date_reported, y=Cumulative_cases, color=Country)) +
    geom_line()

Which region has few countries?

covid %>% 
    group_by(WHO_region) %>% 
    summarize(n_distinct(Country))
# A tibble: 7 x 2
  WHO_region `n_distinct(Country)`
  <chr>                      <int>
1 AFRO                          50
2 AMRO                          56
3 EMRO                          22
4 EURO                          62
5 Other                          1
6 SEARO                         11
7 WPRO                          35

The easiest case is SEARO: South East Asia

South East Asia

covid %>% 
    filter(WHO_region=="SEARO") %>% 
    ggplot(aes(x=Date_reported, y=New_cases, color=Country)) +
    geom_line()

South East Asia in April

covid %>% filter(WHO_region=="SEARO") %>% 
    filter(Date_reported>"2020-03-31", Date_reported< "2020-05-01") %>% 
    ggplot(aes(x=Date_reported, y=New_cases, color=Country)) +
    geom_line()

South East Asia with columns

covid %>% filter(WHO_region=="SEARO", 
    Date_reported>"2020-03-31", Date_reported<"2020-05-01") %>% 
    ggplot(aes(x=Date_reported, y=New_cases, color=Country)) +
    geom_col()

Columns have color and fill

covid %>% filter(WHO_region=="SEARO", 
    Date_reported>"2020-03-31", Date_reported< "2020-05-01") %>% 
    ggplot(aes(x=Date_reported, y=New_cases, fill=Country)) +
    geom_col()

Putting Columns side to side

covid %>% filter(WHO_region=="SEARO", 
    Date_reported>"2020-04-27", Date_reported< "2020-05-01") %>% 
    ggplot(aes(x=Date_reported, y=New_cases, fill=Country)) +
    geom_col(position=position_dodge())