September 17, 2019

Using R and RStudio

Analyzing Data

for fun and profit

Many disciplines, including Molecular Biology and Genetics, have become more and more data driven.

Starting now, we will use RStudio, a free software for data analysis

Most users of R are molecular biologists, but it is also used by economists, psychologists and marketing specialists

How to use RStudio

You have to install R and RStudio in your computer

You have to execute RStudio. Then you will see a screen like this

Today we will focus only on one part

Click on File → New File → R Markdown

A text editor

You will get a new window with an example text

It is a text file. One character takes one byte

Colors are only a guide for you. They are not part of the text

Today we will learn how to write text files for our course

This is not Microsoft Word

Structure

Structure in Data

Today we will focus on a key idea.

To understand the data we need structure

For example, folders in the disk are a hierarchical structure.

Structured documents

Text documents also have a logical structure

Structured documents

  • Letters form words
  • Several words become phrases and paragraphs
  • Paragraphs are contained in sections and chapters
  • Sometimes we have lists of elements
  • Sometimes we have tabular data
  • Figures
  • References to other works

The problem

Ordinary word processors are based on the WYSIWYG (What You See Is What You Get) philosophy

Users are encouraged to change fonts, sizes, colors and other visual attributes

Separation of form and content

Writing and formatting at the same time is distracting.

The idea is to write first, and format later, as close as possible to the time of publication.

  • WYSIWYG: What You See Is What You Get
    • Microsoft Word
  • Alternative: What You Mean Is What You Get
    • The information you enter defines the meaning of the document
    • The program generates beautiful output

Structural elements

  • Sections
  • Subsections
  • paragraphs
  • lists
    • numbered
    • unnumbered
  • computer code

Structural elements

  • figures
    • captions
  • tables
    • headers
    • columns
  • references
  • metadata

Style elements

  • color
  • font family
  • font size
  • centering
  • number of sections

How to write a structured document

Markdown

An alternative to ordinary Word Processors is to use text files with a few rules to mark the role of each element.

Text files can be read with any computer, and will be accessible for ever.

Today the Structured Text format most often used is Markdown

Here we show some of the rules

Paragraphs

Consecutive lines of text are one paragraph. They are separated by an empty line

The first paragraph.

Another paragraph

The first paragraph.

Another paragraph

Headers

First level header
==================

Second level header
-------------------

Normal text
First level header
Second level header

Normal text

Headers

alternative format

# Header 1
## Header 2
### Header 3
#### Header 4
Header 1
Header 2
Header 3
Header 4

Unordered Lists

+ Item 1
+ Item 2
    + Item 2a
    + Item 2b
  • Item 1
  • Item 2
    • Item 2a
    • Item 2b

Sub-lists are indented by 4 spaces

Ordered Lists

1. Item 1
1. Item 2
1. Item 3
    1. Item 3a
    1. Item 3b
  1. Item 1
  2. Item 2
  3. Item 3
    1. Item 3a
    2. Item 3b

Important paragraph

Quotation

To show something remarkable, for example when someone important said something interesting.

> "The limits of my language mean
> the limits of my world"
> 
> *Ludwig Wittgenstein*

“The limits of my language mean the limits of my world”

Ludwig Wittgenstein

Images

You have to indicate the web address of the image

![optional text](http://example.com/logo.png)

or the name of a file in the same directory

![optional text](images/logo.png)

Tables

|        | sample   | dose | time   | agent            |
|--------|----------|------|--------|------------------|
| 1      | GSM91440 | low  | 5 min  | caffeine         |
| 2      | GSM91893 | low  | 5 min  | caffeine         |
| 3      | GSM91428 | low  | 5 min  | calcofluor white |
| 4      | GSM91881 | low  | 5 min  | calcofluor white |
sample dose time agent
1 GSM91440 low 5 min caffeine
2 GSM91893 low 5 min caffeine
3 GSM91428 low 5 min calcofluor white
4 GSM91881 low 5 min calcofluor white

Computer code

Programs are usually written in a monospaced font.
That is, all letters have the same width.

```
this <- is.computer(code)
```

this <- is.computer(code)

This will be very important in the rest of the course

Header and metadata

At the beginning of the file

---
title: "Title"
author: "Author's name"
date: "4 October 2016"
output: html_document
---

Notice that the block is wrapped by --- (three hyphens)

Format inside a paragraph

Links

Inline code

We can speak about `x` and `data`

We can speak about x and data

Exercise: Write in Markdown

How to solve it

by G. Polya

  • You have to understand the problem.
  • Find the connection between the data and the question. You should obtain a plan of the solution.
  • Carry out your plan.
  • Examine the solution obtained.

More information

read and learn it