Class 5: Structured text files with Markdown

Computing for Molecular Biology 1

Andrés Aravena, PhD

26 October 2020

Text documents are good

Text files are for humans and computers

  • Binary files are hard to read
    • unless you have the correct program
  • Text files can be read by humans
    • Each byte is a letter
  • Text files can be read by computers
    • Data must be recyclable
    • The output of one program may be the input of another program

Text files are for ever

Free

  • nothing to pay
  • you can do whatever you want

Never get obsolete

But they do not have structure

Structured Documents

We want to identify the meaning, not the shapes

  • Title
  • Sections
    • Subsections
      • Lists
      • Figures
      • Tables
  • References to other works

Documents have two components

  • visual and design aspects (presentation and style)
  • core material and structure (content) of a document

This is called “Separation of content and presentation”

We can do it in Word

Once you have identified the structure of the document, you have to describe them to the computer

You can also do it in Word, using the mouse

(but the keyboard is faster)

But Word files are not text

Text files can be read with any computer,
and will be accessible for ever.

We want to use text files

We need a way to markup style into plain files

Text editors instead of Word processors

The easiest way to handle text files is to use a text editor

These are programs to view and edit text files

They use a font like Courier

Each letter has the same size

Text editor have syntax coloring

Since each letter has the same size, text editor use color

The color depends on the role of each text

For example, headings can be in red color

The color is not in the file. The editor puts colors

Let’s see one example

I will show you my screen

Markdown

An alternative to ordinary Word Processors

Text files with a few rules to mark element’s roles

The Structured Text format most often used is Markdown

Here we show some of the rules

Marking format

The basic idea is to use some symbols as special

Moreover, they are special only in some context

Markdown files are not plain

Markdown files are text files with structure

Paragraphs

  • Consecutive lines of text are one paragraph.
  • They are separated by an empty line
The first paragraph.

Another paragraph

The first paragraph.

Another paragraph

Headers

First level header
==================

Second level header
-------------------

Normal text
First level header
Second level header

Normal text

Headers

alternative format

# Header 1
## Header 2
### Header 3
#### Header 4
Header 1
Header 2
Header 3
Header 4

Let’s see an example

Text editors handling Markdown

These work with Markdown and other formats

All are good. We use RStudio

Markdown Text editors

Online Markdown editors

Today we will use the last one

Practice

More structural elements

Unordered Lists

+ Item 1
+ Item 2
    + Item 2a
    + Item 2b
  • Item 1
  • Item 2
    • Item 2a
    • Item 2b

Sub-lists are indented by 4 spaces

Ordered Lists

1. Item 1
1. Item 2
1. Item 3
    1. Item 3a
    1. Item 3b
  1. Item 1
  2. Item 2
  3. Item 3
    1. Item 3a
    2. Item 3b

Important paragraph

Quotation

To show something remarkable, for example when someone important said something interesting.

> "The limits of my language mean
> the limits of my world"
> 
> *Ludwig Wittgenstein*

“The limits of my language mean the limits of my world”

Ludwig Wittgenstein

Images

You have to indicate the web address of the image

![optional text](http://example.com/logo.png)

or the name of a file in the same directory

![optional text](images/logo.png)

Tables

|        | sample   | dose | time   | agent            |
|--------|----------|------|--------|------------------|
| 1      | GSM91440 | low  | 5 min  | caffeine         |
| 2      | GSM91893 | low  | 5 min  | caffeine         |
| 3      | GSM91428 | low  | 5 min  | calcofluor white |
| 4      | GSM91881 | low  | 5 min  | calcofluor white |
sample dose time agent
1 GSM91440 low 5 min caffeine
2 GSM91893 low 5 min caffeine
3 GSM91428 low 5 min calcofluor white
4 GSM91881 low 5 min calcofluor white

Computer code

Programs are usually written in a monospaced font.
That is, all letters have the same width.

```
this <- is.computer(code)
```

this <- is.computer(code)

This will be very important in the rest of the course

Format inside a paragraph

Emphasis

Use it only when strictly necessary

Inside the paragraph we can have *italics*
and **bold** text

Inside the paragraph we can have italics and bold text

Inline code

We can compare `x` and `data`

We can compare x and data

Markdown in RStudio

How to use RStudio

You have to install R and RStudio in your computer

You have to execute RStudio. Then you will see a screen like this

Today we will focus only on one part

Click on File → New File → R Markdown

A text editor

You will get a new window with an example text

It is a text file. One character takes one byte

Colors are only a guide for you. They are not part of the text

Today we will learn how to write text files for our course

This is not Microsoft Word

Header and metadata

At the beginning of the file

---
title: "Title"
author: "Author's name"
date: "4 October 2016"
output: html_document
---

Notice that the block is wrapped by --- (three hyphens)

Practice

Write in Markdown

How to solve it

by G. Polya

  • You have to understand the problem.
  • Find the connection between the data and the question. You should obtain a plan of the solution.
  • Carry out your plan.
  • Examine the solution obtained.

Quiz 2

Describe the structure of a paper

You must identify the structural elements of a document

Something like

  • Title: “Ten simple rules for online learning”
  • Author: David B. Searls
  • Section: “Rule 1: Make a Plan”

Do it now

Go to our Google Sheet (the same where we how attendance)

Look for your name in “Documents”

Open the document assigned to you

Describe the document structure

Go to Quiz 2 and answer there

You have 30 minutes for this

Homework

Mandatory for everybody

Homework: Markdown

Copy and paste the document text into an Rmarkdown file

You must identify the structural elements and write it on Rmarkdown

The important part is the Structure

It must have the same structure as the original paper

Homework is mandatory. Answers are individual

Delivery: until next class

Send your answers to

andres.aravena+cmb@istanbul.edu.tr

even if you did not finish

Everybody should deliver on time

Do not send questions to this address

Ask your questions in the forum

More information

Read and learn