Class 9: More about Rmarkdown

Computing in Molecular Biology and Genetics 1

Andrés Aravena, PhD

2 November 2020

This is an answer to several good questions asked in the forum

Congratulations for the good work

Structure in text files

Markdown is a method to write the structure of a text document

  • The text file can be read and understood easily

  • It can be transformed into other formats

    • PDF, Word, Webpage (HTML)

Compiling is transforming from Markdown to other format

Flavors of Markdown

There are many different Markdown compilers

Many people make their own compiler, and they expand the original idea

Unfortunately, they are not always 100% compatible

We use Rmarkdown flavor

Rmarkdown uses a compiler called Pandoc

If you want to know more

There are two websites that tell all the details

Metadata

This is all data about the document
beyond the document itself

RMarkdown uses a header for metadata
written in a different format called YAML

YAML is an official Internet standard

It can be more than one block

Interpretation depends on a template

Markdown rules

Metadata
Data about the data: date, bibliography, etc.
Uses YAML format
use --- (three hyphens) before and after
There can be many through the document
Header
Metadata at the beginning of the file
Uses YAML format
Declare title, author, and date

Example of Header Metadata

---
title: "Midterm Exam"
subtitle: "Computing in Molecular Biology 1"
author: "Put your name here"
number: STUDENT_NUMBER
date: "November 24, 2020"
output: html_document
---

ALWAYS write your name and student number

Rules for YAML files

  • Each list element starts in the first column. No spaces

  • The inner list elements are indented with 2 spaces

  • You can have lists inside lists inside lists…

  • Name and values are separated by :

  • Always space after :

  • --- before and after the YAML code

Google “YAML” for more info. See also https://en.wikipedia.org/wiki/YAML

Templates

Visual organization is done with Templates

The document organization is defined in a template document

It defines how to show title, author, date

Here we use only templates already defined

but you can make your own templates if needed

Same structure can have many styles

The easiest way is to change the theme

  • Several HTML styles

You can also change the output format

  • Word, PDF

All this is written in the header

Click on this “gear” ⚙ and “output options”

Then select a theme

Theme and output format are chosen in the header

Please notice how the YAML header changes when we change theme

It also changes when we choose another output format

You do not need to use the menu

Instead you can write directly in the header

Authors

How to write many authors

Authors are metadata, so we write them in the header

We should write them in a structured way

Different templates handle authors in different ways

Easy way

---
author: "Melissa A. Wilson Sayres"
---

Sometimes you can omit the " but is safer to use them.
Otherwise the compiler may get confused

Author with details

---
author:
  name: "Melissa A. Wilson Sayres"
  affiliation: "School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America"
  role: "Conceptualization, Data curation, Formal analysis"
---

The first line has key but not value

The details have 2 spaces before the key name

Several authors

---
author:
  - "Melissa A. Wilson Sayres"
  - "Charles Hauser"
  - "Michael Sierk"
  - "Srebrenka Robic"
---

The first line has key but not value

It is followed by a list of values, with two spaces and a dash

Combined

---
author:
 - name: "Melissa A. Wilson Sayres"
   affiliation: "School of Life Sciences, Arizona State University, Tempe, Arizona, United States of America"
 - name: "Charles Hauser"
   affiliation: "Department of Biological Sciences, St. Edward’s University, Austin, Texas, United States of America"
 - name: "Michael Sierk"
   affiliation: "Bioinformatics Program, Saint Vincent College, Latrobe, Pennsylvania, United States of America"
 - name: "Srebrenka Robic"
   affiliation: "Department of Biology, Agnes Scott College, Decatur, Georgia, United States of America"
---

Example: Distill format

---
author:
  - first_name: "Yihui"
    last_name: "Xie"
    url: https://github.com/yihui
    affiliation: RStudio
    affiliation_url: https://www.rstudio.com
    orcid_id: 0000-0003-0645-5666
  - name: "JJ Allaire"
    url: https://github.com/jjallaire
    affiliation: RStudio
    affiliation_url: https://www.rstudio.com
  - name: "Rich Iannone"
    url: https://github.com/rich-iannone
    affiliation: RStudio
    affiliation_url: https://www.rstudio.com
output: distill::distill_article
---

Bibliography

Use YAML for bibliography

references:
- type: article-journal
  id: WatsonCrick1953
  title: 'Molecular structure of nucleic acids: a structure for
    deoxyribose nucleic acid'
  author:
  - family: Watson
    given: J. D.
  - family: Crick
    given: F. H. C.
  container-title: Nature
  volume: 171
  issue: 4356
  page: 737-738
  issued:
    date-parts:
    - - 1953
      - 4
      - 25

How to use it

  • [@WatsonCrick1953] becomes (Watson and Crick 1953)
  • [@WatsonCrick1953, pp. 33-35, 38-39] becomes (Watson and Crick 1953, 33–35, 38–39).
  • [@WatsonCrick1953; @Collado-Vides2009a] becomes (Watson and Crick 1953; Collado-Vides et al. 2009).
  • @WatsonCrick1953 [p. 33] says … becomes Watson and Crick (1953, 33) says …

External bibliographies

Put all the references somewhere in the document, with --- before and after.

If you have a long list of all papers, and you use it on several documents, then you can put all in a separate file

Then you write

bibliography: references.yml

in the document metadata

Other formats for references

Format File extension
BibLaTeX .bib
BibTeX .bibtex
CSL JSON .json
CSL YAML .yaml
EndNote .enl
EndNote XML .xml
ISI .wos
MEDLINE .medline
MODS .mods
RIS .ris

Tools for managing the bibliography

It is good that RMarkdown uses all these formats

There are many tools to manage your paper collection

It is not enough to download PDF and store them in a folder. They need to be organized and have a structure

Two good and free programs are Mendeley and Zotero

Bibliography at the end of document

Bibliographies will be placed at the end of the document. Normally, you will want to end your document like this:

last paragraph...

# References

The bibliography will be inserted after this header

Superscripts and Footnotes

Subscripts and superscripts

Wrap text in a single ~ to put it under the line
Use ^ to put over the line

H~2~O and E=mc^2^

H2O and E=mc2

In this case these are structural because they have a meaning

Foot notes

Footnotes are used to add details or comments[^1] that
are not essential to the main text, but that are good to know

[^1]: Like this comment here

Footnotes are used to add details or comments1 that are not essential to the main text, but that are good to know

(the footnote is at the end of the presentation)

Footnotes

There are two parts

  • In the main text, write [^label]

  • Below (anywhere) write [^label]: footnote text

  • label can be any text, but should not be repeated

  • Footnotes numbers are automatic

Equations

Other Structured text formats

Markdown files are text files with structure
There are other standards with similar goals
One very common is HTML, used for web pages

<h1>Section Header</h1>
<p>Paragraph with H<sub>2</sub>O
and E=mc<sup>2</sup></p>

Section Header

Paragraph with H2O and E=mc2

Complete HTML file

<html>
<head>
  <title>Home</title>
  <link rel="stylesheet" href="style.css">
</head>
<body>
  <h1>Section Header</h1>
  <p>paragraph</p>
</body>

HTML is hard to write

HTML is text, so we can read and write

But it is not easy to write

The good text editors help you to write HTML

But Markdown is easier to write, and then compile to HTML

HTML can have different styles

Remember this part in the HTML page

<link rel="stylesheet" href="style.css">

LaTeX: High quality documents

In Physics, Mathematics, Computer sciences and other disciplines, everybody uses another system: LaTeX

It was invented before 1980 to write math

All good biologists in Europe know how to use LaTeX

I wrote all my thesis in LaTeX

LaTeX looks like this

\documentclass{article}
\usepackage{style}
\title{Document title}
\author{Author Name}
\begin{document}
\section{Section Header}
Paragraph

Paragraph
\end{document}

The \, {, } symbols are special

With great power comes great responsibility

Disadvantages of LaTeX

  • Like HTML, it is hard to write without good tools
  • Tables are hard to describe
    • but they are super flexible

Advantages of LaTeX

  • Floating figures
  • Professional look
  • Cross-references
  • Bibliographies
  • Math

RMarkdown math uses LaTeX math

Displayed formulas are wrapped with $$

$$
\sum_{i=1}^n i = \frac{n(n+1)}{2}
$$

\[ \sum_{i=1}^n i = \frac{n(n+1)}{2} \]

In most cases is just “reading the formula in english”

Math has different format

If we write mathematic formulas, it is good to see the difference between x and \(x\)

The visual style has meaning

In this course we do not write math in text

If you need to write math, look for LaTeX

How do I write LaTeX

I used to write LaTeX directly

but it takes a lot of time

Today I only write Markdown, including math
and translate it to LaTeX

Sharing your error messages

Cellphone photo

Hard to read, cannot be edited, 7 Mega bytes
7MB

Copy & paste only text

Can be translated. Can be edited. 338 Bytes (20000 times smaller)

Error in eval(parse(text = name)) : object 'html_documen' not found
Calls: <Anonymous> ... create_output_format -> create_output_format_function -> eval -> eval
Execution haltedError in eval(parse(text = name)) : object 'html_documen' not found
Calls: <Anonymous> ... create_output_format -> create_output_format_function -> eval -> eval

Other topics not discussed here

  • Captions of Tables
  • Links to document anchors (#like-this)
  • How to define anchors
  • Creating new blocks with structural meaning

Conclusion

Markdown is not the goal

Some of you will not use RMarkdown

The important idea is Structure

Think about the document structure

You can use Word® better if you use know your structure elements

Markdown is good for collaboration

  • GitHub
  • Lightweight
  • focus on structure
  • Can be combined with R, Python, etc
  • Most used markup language today
  • Can produce great documents

Appendix

References

Collado-Vides, J, H Salgado, E Morett, S Gama-Castro, V Jiménez-Jacinto, I Martínez-Flores, A Medina-Rivera, L Muñiz-Rascado, M Peralta-Gil, and A Santos-Zavaleta. 2009. “Bioinformatics Resources for the Study of Gene Regulation in Bacteria.” Journal of Bacteriology 191 (1): 23–31.

Watson, J. D., and F. H. C. Crick. 1953. “Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid.” Nature 171 (4356): 737–38. https://doi.org/10.1038/171737a0.


  1. Like this comment here↩︎