Class 27: More beautiful plots

Computing for Molecular Biology 1

Andrés Aravena, PhD

11 January 2021

Multiple plots

Original plot

covid %>% filter(WHO_region=="SEARO") %>% 
    filter(Date_reported>"2020-03-31", Date_reported< "2020-05-01") %>% 
    ggplot(aes(x=Date_reported, y=New_cases, color=Country)) +
    geom_line()

Separate facets, same y axis

covid %>% filter(WHO_region=="SEARO") %>% 
    filter(Date_reported>"2020-03-31", Date_reported< "2020-05-01") %>% 
    ggplot(aes(x=Date_reported, y=New_cases)) +
    facet_wrap(vars(Country), ncol=3) + geom_line()

Independent y axis

covid %>% filter(WHO_region=="SEARO") %>% 
    filter(Date_reported>"2020-03-31", Date_reported< "2020-05-01") %>% 
    ggplot(aes(x=Date_reported, y=New_cases)) +
    facet_wrap(vars(Country), ncol=3, scales = "free_y") + geom_line()

Dots and lines

Number of transistors in chips

We have up-to-date data about microprocessors manufacture.

These are integrated chips that contain many transistors, and enable the modern world as it is

Transistors are one of the most important inventions

transistors <- read_tsv("Transistor_count_2020.txt")
model <- lm(log(count) ~ Date, data=transistors)
transistors$predicted <- exp(predict(model))

We add a column with the predicted number of transistors per chip

Moore’s law

ggplot(transistors, aes(x=Date, y=count)) + 
    geom_point(alpha=0.5) +
    geom_line(aes(x=Date, y=predicted), color="red")

Logarithmic Scales

Kleiber’s law

kleiber %>% ggplot(aes(x=kg, y=kcal)) + geom_point(alpha=0.5)

Plotting logarithms

kleiber %>% ggplot(aes(x=log(kg), y=log(kcal))) + geom_point(alpha=0.5)

In the paper

Plotting in a logarithmic scale

kleiber %>% ggplot(aes(x=kg, y=kcal)) + 
    scale_x_log10() + scale_y_log10() + 
    geom_point(alpha=0.5)

Prediction using a liner model

We need to fit a log-log linear model and add a new column with the predicted values

model <- lm(log(kcal) ~ log(kg), data=kleiber)
kleiber %>% mutate(predicted=exp(predict(model))) -> kleiber

We can store the plot in a variable

kleiber %>% ggplot(aes(x=kg, y=kcal)) + 
    scale_x_log10(labels = scales::label_number(accuracy=0.1)) +
    scale_y_log10() + geom_point(alpha=0.5) +
    geom_line(aes(x=kg, y=predicted)) -> p

Notice that we use labels = scales::label_number() to change the number’s format on the x axis.

There are many options in the scales package

To see the plot, we print() it

print(p)

In most cases, we use the variable name

p

Now we can add more attributes to p

We can choose a theme for the general look

p + theme_linedraw()

We can add labels and titles

p + theme_linedraw(base_line_size = 2) +
    labs(x="Body Weight, Kg.", y="Heatprod. p. day, kcal.")

We can change the look of each part

p + theme_linedraw() + theme(axis.title = element_text(color = "red")) +
    labs(x="Body Weight, Kg.", y="Heatprod. p. day, kcal.")

Exercise

  • Add a title
  • Change the colors of the axis
  • Change the font family for the labels
  • Add the animal’s names as in the original graphic

Publishing plots

Image format

If you take a picture with your camera, it usually gets saved as JPG

When we make a plot in RMarkdown, by default it is saved as a PNG file

If you are writing a paper, you may want to save it as PDF of SVG

How do we choose the format?

What is the difference?

  • PNG is a bitmap format: a 2D array of pixels
    • other examples are JPG, TIFF, GIF
  • PDF is a vectorial format: a mathematical description
    • other example is SVG

The difference is seen when you zoom in

  • PDF is good to print in paper
  • PNG is better for screen and presentations

RMarkdown device

To store the image as SVG, use dev="svg" in the RMarkdown code chunk 

`​`​`​{r dev="svg"}
ggplot(students, aes(x=height_cm, y=weight_kg)) + geom_point()
​`​`​`

You can set the default device at the beginning of the RMarkdown file, with the command

knitr::opts_chunk$set(dev="svg")

Saving a ggplot

We can save any ggplot image with the command

ggsave(filename, plot, device, width, height)

Only filename is mandatory. Other arguments are optional

If we omit the option plot=, it will save the last one

Option device= can be “pdf”, “tiff”, “jpeg”, “png”, “svg”, or several others

For more details, use

help(ggsave)

To learn more

Cheat sheets

In particular

Vignettes

R packages usually include examples and an explanation document, called vignettes

Try the command

vignette("ggplot2-specs")