October 25, 2018

Comments about Quiz 2

The most asked question

Find the name of the person with the median age

x$age
[1] 20 23 26 29 32 35 38 41 44

Step 1: find the median age

median(x$age)
[1] 32

The name of the person with the median age

Step 2: find which ages are equal to the median age

x$age==median(x$age)
[1] FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE
[9] FALSE

Step 3: find who has the age equal to the mean age

x$name[x$age==median(x$age)]
[1] "Elif"

There is no shorter way

Some people tried x$name[median(x$age)], which is the same as

x$name[32]

That is, the element of x$name at position 32

The value of the median is not the same as the position of the median

Same happens with min()

What if we look for the name of the youngest person?

The minimal age is

min(x$age)
[1] 20

Which is the element of minimal age?

which.min(x$age)
[1] 1

Same happens with max()

The minimal age is

max(x$age)
[1] 44

Which is the element of minimal age?

which.max(x$age)
[1] 9

This difference is important. We will use it later

Sorting

Sorting a vector

We have our own data

survey <- read.table("survey1-tidy.txt")
weight <- survey$weight
weight
 [1]  67.0  58.0  56.0  94.0  60.0  77.0  56.0  75.0
 [9]  80.0 105.0  59.0  70.0  57.0  50.0  78.0  55.0
[17] 106.0  68.0  68.0  65.0  76.0  42.5  55.0  69.0
[25]  60.0  58.0  52.0  47.0  65.0  67.0  68.0  74.0
[33]  55.0  55.0  60.0  50.0  55.0  58.0  75.0  53.0
[41]  81.0  54.0  55.0  72.0  65.0  64.0  54.0  85.0
[49]  63.0  75.0  77.0

How can we sort weight?

sort(weight)
 [1]  42.5  47.0  50.0  50.0  52.0  53.0  54.0  54.0
 [9]  55.0  55.0  55.0  55.0  55.0  55.0  56.0  56.0
[17]  57.0  58.0  58.0  58.0  59.0  60.0  60.0  60.0
[25]  63.0  64.0  65.0  65.0  65.0  67.0  67.0  68.0
[33]  68.0  68.0  69.0  70.0  72.0  74.0  75.0  75.0
[41]  75.0  76.0  77.0  77.0  78.0  80.0  81.0  85.0
[49]  94.0 105.0 106.0

How to sort from large to small?

sort(weight, decreasing=TRUE)
 [1] 106.0 105.0  94.0  85.0  81.0  80.0  78.0  77.0
 [9]  77.0  76.0  75.0  75.0  75.0  74.0  72.0  70.0
[17]  69.0  68.0  68.0  68.0  67.0  67.0  65.0  65.0
[25]  65.0  64.0  63.0  60.0  60.0  60.0  59.0  58.0
[33]  58.0  58.0  57.0  56.0  56.0  55.0  55.0  55.0
[41]  55.0  55.0  55.0  54.0  54.0  53.0  52.0  50.0
[49]  50.0  47.0  42.5

How to sort the complete data frame?

The command sort() works only for vectors

To sort a data frame, we first need to choose which column we use to order

We know the position of the smallest and the largest

Position of the smallest and largest

which.min(weight)
[1] 22
which.max(weight)
[1] 17

We need the positions in between

For that we use the order() command

Let’s verify if this is correct

weight
 [1]  67.0  58.0  56.0  94.0  60.0  77.0  56.0  75.0
 [9]  80.0 105.0  59.0  70.0  57.0  50.0  78.0  55.0
[17] 106.0  68.0  68.0  65.0  76.0  42.5  55.0  69.0
[25]  60.0  58.0  52.0  47.0  65.0  67.0  68.0  74.0
[33]  55.0  55.0  60.0  50.0  55.0  58.0  75.0  53.0
[41]  81.0  54.0  55.0  72.0  65.0  64.0  54.0  85.0
[49]  63.0  75.0  77.0

Using order() to sort a data frame

order(weight)
 [1] 22 28 14 36 27 40 42 47 16 23 33 34 37 43  3  7 13
[18]  2 26 38 11  5 25 35 49 46 20 29 45  1 30 18 19 31
[35] 24 12 44 32  8 39 50 21  6 51 15  9 41 48  4 10 17

This gives us the position of the smallest, the second smallest, and so on up to the largest

Then we do survey[order(weight),]

     Gender birth_day birth_month birth_year height_cm weight_kg handness
st22 Female        13          10       1997       155      42.5    Right
st28 Female         7           7       1997       166      47.0    Right
st14 Female         3           7       1997       160      50.0    Right
st36 Female        24           3       1998       167      50.0    Right
st27 Female        13          10       1997       171      52.0    Right
st40 Female         5           2       1998       157      53.0    Right
st42 Female        18           5       1997       165      54.0    Right
st47 Female        29           7       1997       160      54.0    Right
st16 Female         3           9       2018       164      55.0    Right
st23 Female         2          10       1998       172      55.0    Right
st33 Female        21           5       1998       168      55.0    Right
st34 Female         3           9       1998       174      55.0    Right
st37 Female        17           9       1998       173      55.0    Right
st43 Female        23           5       1999       178      55.0    Right
st3  Female        28           1       1995       170      56.0     Left
st7  Female         5           4       1996       173      56.0    Right
st13 Female         9           6       1998       158      57.0    Right
st2  Female         9          10       1995       167      58.0    Right
st26 Female        17           5       1998       165      58.0    Right
st38 Female         2           1       1999       162      58.0    Right
st11   Male        26          12       1997       176      59.0    Right
st5  Female         1           1       1991       160      60.0    Right
st25 Female        17           8       1998       163      60.0    Right
st35 Female         1           9       1998       174      60.0    Right
st49 Female         2           5       1999       165      63.0     Left
st46   Male         6          11       1998       163      64.0    Right
st20 Female        30           6       1997       158      65.0    Right
st29   Male        28           7       1998       185      65.0     Left
st45   Male         6          12       1997       166      65.0    Right
st1    Male         1           2       1993       179      67.0    Right
st30   Male         5           1       1997       178      67.0    Right
st18 Female        16          11       1998       163      68.0    Right
st19 Female         3           5       1998       162      68.0    Right
st31   Male        27          11       1997       180      68.0    Right
st24 Female        10           6       1998       159      69.0    Right
st12   Male         9           2       1997       183      70.0    Right
st44 Female        19           9       1997       174      72.0    Right
st32   Male        29           8       1998       170      74.0    Right
st8  Female        14           1       1997       162      75.0     Left
st39   Male        19          11       1998       175      75.0    Right
st50   Male        31          10       1998       184      75.0    Right
st21   Male        15           1       2018       175      76.0    Right
st6    Male        26           9       1996       175      77.0    Right
st51   Male         9           3       1996       177      77.0    Right
st15   Male        13          10       1998       182      78.0    Right
st9    Male         1           5       1997       173      80.0    Right
st41   Male        18           5       1997       181      81.0    Right
st48   Male        14           3       1993       195      85.0    Right
st4    Male        11           8       1992       180      94.0    Right
st10   Male        25           6       1997       188     105.0    Right
st17   Male        10           1       1998       175     106.0    Right
     hand_span_cm
st22           20
st28           20
st14           15
st36           30
st27           25
st40           20
st42           18
st47           20
st16           20
st23           20
st33           14
st34           22
st37            8
st43           12
st3            18
st7            21
st13           19
st2            18
st26           19
st38           19
st11           24
st5            19
st25           15
st35           24
st49           17
st46           15
st20            8
st29           22
st45           15
st1            15
st30           24
st18           13
st19           13
st31           19
st24           18
st12           20
st44           16
st32           25
st8            18
st39           20
st50           22
st21           20
st6            18
st51           23
st15           21
st9            16
st41           20
st48           30
st4            25
st10           20
st17           15

The “App Store”

Packages and Libraries

  • All interesting things in R are done using functions
    • We recognize them because they use ()
  • Several functions of the same subject are grouped in a package
  • To use functions from a package we need to load them using library()
  • If the package is not in your computer, you need to use install.packages()

knitr: a package for Rmarkdown

Knitr is the system that merges R code and Markdown to produce documents that depend on data

It has many functions. We used two of them:

  • knitr::kable() is a function to produce nicer tables
    • The mandatory input is a data.frame
    • It is similar to the function pander() from the pander package
  • knitr::opts_chunk$set() to set the default options for each chunk

Without kable()

survey[1:5,]
    Gender birth_day birth_month birth_year height_cm weight_kg handness
st1   Male         1           2       1993       179        67    Right
st2 Female         9          10       1995       167        58    Right
st3 Female        28           1       1995       170        56     Left
st4   Male        11           8       1992       180        94    Right
st5 Female         1           1       1991       160        60    Right
    hand_span_cm
st1           15
st2           18
st3           18
st4           25
st5           19

With kable()

knitr::kable(survey[1:5,])
Gender birth_day birth_month birth_year height_cm weight_kg handness hand_span_cm
st1 Male 1 2 1993 179 67 Right 15
st2 Female 9 10 1995 167 58 Right 18
st3 Female 28 1 1995 170 56 Left 18
st4 Male 11 8 1992 180 94 Right 25
st5 Female 1 1 1991 160 60 Right 19

Not all data is a data frame

When data has no structure

So far all the files we have used is structured

That is, they have rows and columns

We use read.table and write.table to read and write a data frame

Sometimes the data is not a table

Example

people <- list(Ali=list(age=18, sex='M'), Bahar=list(age=19, sex='F'), valid=c(TRUE,FALSE))
people
$Ali
$Ali$age
[1] 18

$Ali$sex
[1] "M"


$Bahar
$Bahar$age
[1] 19

$Bahar$sex
[1] "F"


$valid
[1]  TRUE FALSE

How can we read and write lists?

YAML: format for lists

There are several options to store lists into files.
A good one is YAML, which looks like this:

Ali:
  age: 18.0
  sex: M
Bahar:
  age: 19.0
  sex: F
valid:
- yes
- no

Rules for YAML files

  • Each list element starts in the first column. No spaces
  • The inner list elements are indented with 2 spaces
  • You can have lists inside lists inside lists…
  • Name and values are separated by :
  • Vector elements are marked with -
  • When used inside Rmarkdown, put --- before and after the YAML code

Google “YAML” for more info

You have seen YAML before

We use YAML for the Rmarkdown metadata. For example

---
title: "Midterm Exam"
subtitle: "Computing in Molecular Biology 1"
author: "Put your name here"
number: STUDENT_NUMBER
date: "October 25, 2018"
output: html_document
---
ALWAYS write your name and student number

Reading and writing YAML in R

library(yaml)
write_yaml(people, "datafile.yml")
persons <- read_yaml("datafile.yml")
persons
$Ali
$Ali$age
[1] 18

$Ali$sex
[1] "M"


$Bahar
$Bahar$age
[1] 19

$Bahar$sex
[1] "F"


$valid
[1]  TRUE FALSE

Use YAML for bibliography

references:
- type: article-journal
  id: WatsonCrick1953
  title: 'Molecular structure of nucleic acids: a structure for
    deoxyribose nucleic acid'
  author:
  - family: Watson
    given: J. D.
  - family: Crick
    given: F. H. C.
  container-title: Nature
  volume: 171
  issue: 4356
  page: 737-738
  issued:
    date-parts:
    - - 1953
      - 4
      - 25

How to use it

Put all the references somewhere in the document, with --- before and after.

  • [@WatsonCrick1953] produces (Watson and Crick 1953)
  • [@WatsonCrick1953, pp. 33-35, 38-39] becomes (Watson and Crick 1953, 33–35, 38–39).
  • [@WatsonCrick1953; @Collado-Vides2009a] becomes (Watson and Crick 1953; Collado-Vides et al. 2009).
  • @WatsonCrick1953 [p. 33] says blah becomes Watson and Crick (1953, 33) says blah

External bibliographies

If you have a long list of all papers, and you use it on several documents, then you should put the references in a separate file

Then you write

bibliography: references.yml

in the document metadata

Other formats for references

Format File extension
BibLaTeX .bib
BibTeX .bibtex
Copac .copac
CSL JSON .json
CSL YAML .yaml
EndNote .enl
EndNote XML .xml
ISI .wos
MEDLINE .medline
MODS .mods
RIS .ris

Tools for managing the bibliography

It is good that RMarkdown uses all these formats

There are many tools to manage your paper collection

It is not enough to download PDF and store them in a folder. They need to be organized and have a structure

Two good and free programs are Mendeley and Zotero

Bibliography at the end of document

Bibliographies will be placed at the end of the document. Normally, you will want to end your document like this:

last paragraph...

# References

The bibliography will be inserted after this header. More info at

http://rmarkdown.rstudio.com/ authoring_bibliographies_and_citations.html

References

Collado-Vides, J, H Salgado, E Morett, S Gama-Castro, V Jiménez-Jacinto, I Martínez-Flores, A Medina-Rivera, L Muñiz-Rascado, M Peralta-Gil, and A Santos-Zavaleta. 2009. “Bioinformatics Resources for the Study of Gene Regulation in Bacteria.” Journal of Bacteriology 191 (1): 23–31.

Watson, J. D., and F. H. C. Crick. 1953. “Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid.” Nature 171 (4356): 737–38. https://doi.org/10.1038/171737a0.