October 25, 2018
Find the name of the person with the median age
x$age
[1] 20 23 26 29 32 35 38 41 44
Step 1: find the median age
median(x$age)
[1] 32
Step 2: find which ages are equal to the median age
x$age==median(x$age)
[1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE [9] FALSE
Step 3: find who has the age equal to the mean age
x$name[x$age==median(x$age)]
[1] "Elif"
Some people tried x$name[median(x$age)], which is the same as
x$name[32]
That is, the element of x$name at position 32
The value of the median is not the same as the position of the median
min()What if we look for the name of the youngest person?
The minimal age is
min(x$age)
[1] 20
Which is the element of minimal age?
which.min(x$age)
[1] 1
max()The minimal age is
max(x$age)
[1] 44
Which is the element of minimal age?
which.max(x$age)
[1] 9
This difference is important. We will use it later
We have our own data
survey <- read.table("survey1-tidy.txt")
weight <- survey$weight
weight
[1] 67.0 58.0 56.0 94.0 60.0 77.0 56.0 75.0 [9] 80.0 105.0 59.0 70.0 57.0 50.0 78.0 55.0 [17] 106.0 68.0 68.0 65.0 76.0 42.5 55.0 69.0 [25] 60.0 58.0 52.0 47.0 65.0 67.0 68.0 74.0 [33] 55.0 55.0 60.0 50.0 55.0 58.0 75.0 53.0 [41] 81.0 54.0 55.0 72.0 65.0 64.0 54.0 85.0 [49] 63.0 75.0 77.0
weight?sort(weight)
[1] 42.5 47.0 50.0 50.0 52.0 53.0 54.0 54.0 [9] 55.0 55.0 55.0 55.0 55.0 55.0 56.0 56.0 [17] 57.0 58.0 58.0 58.0 59.0 60.0 60.0 60.0 [25] 63.0 64.0 65.0 65.0 65.0 67.0 67.0 68.0 [33] 68.0 68.0 69.0 70.0 72.0 74.0 75.0 75.0 [41] 75.0 76.0 77.0 77.0 78.0 80.0 81.0 85.0 [49] 94.0 105.0 106.0
sort(weight, decreasing=TRUE)
[1] 106.0 105.0 94.0 85.0 81.0 80.0 78.0 77.0 [9] 77.0 76.0 75.0 75.0 75.0 74.0 72.0 70.0 [17] 69.0 68.0 68.0 68.0 67.0 67.0 65.0 65.0 [25] 65.0 64.0 63.0 60.0 60.0 60.0 59.0 58.0 [33] 58.0 58.0 57.0 56.0 56.0 55.0 55.0 55.0 [41] 55.0 55.0 55.0 54.0 54.0 53.0 52.0 50.0 [49] 50.0 47.0 42.5
The command sort() works only for vectors
To sort a data frame, we first need to choose which column we use to order
We know the position of the smallest and the largest
which.min(weight)
[1] 22
which.max(weight)
[1] 17
We need the positions in between
For that we use the order() command
weight
[1] 67.0 58.0 56.0 94.0 60.0 77.0 56.0 75.0 [9] 80.0 105.0 59.0 70.0 57.0 50.0 78.0 55.0 [17] 106.0 68.0 68.0 65.0 76.0 42.5 55.0 69.0 [25] 60.0 58.0 52.0 47.0 65.0 67.0 68.0 74.0 [33] 55.0 55.0 60.0 50.0 55.0 58.0 75.0 53.0 [41] 81.0 54.0 55.0 72.0 65.0 64.0 54.0 85.0 [49] 63.0 75.0 77.0
order() to sort a data frameorder(weight)
[1] 22 28 14 36 27 40 42 47 16 23 33 34 37 43 3 7 13 [18] 2 26 38 11 5 25 35 49 46 20 29 45 1 30 18 19 31 [35] 24 12 44 32 8 39 50 21 6 51 15 9 41 48 4 10 17
This gives us the position of the smallest, the second smallest, and so on up to the largest
survey[order(weight),] Gender birth_day birth_month birth_year height_cm weight_kg handness
st22 Female 13 10 1997 155 42.5 Right
st28 Female 7 7 1997 166 47.0 Right
st14 Female 3 7 1997 160 50.0 Right
st36 Female 24 3 1998 167 50.0 Right
st27 Female 13 10 1997 171 52.0 Right
st40 Female 5 2 1998 157 53.0 Right
st42 Female 18 5 1997 165 54.0 Right
st47 Female 29 7 1997 160 54.0 Right
st16 Female 3 9 2018 164 55.0 Right
st23 Female 2 10 1998 172 55.0 Right
st33 Female 21 5 1998 168 55.0 Right
st34 Female 3 9 1998 174 55.0 Right
st37 Female 17 9 1998 173 55.0 Right
st43 Female 23 5 1999 178 55.0 Right
st3 Female 28 1 1995 170 56.0 Left
st7 Female 5 4 1996 173 56.0 Right
st13 Female 9 6 1998 158 57.0 Right
st2 Female 9 10 1995 167 58.0 Right
st26 Female 17 5 1998 165 58.0 Right
st38 Female 2 1 1999 162 58.0 Right
st11 Male 26 12 1997 176 59.0 Right
st5 Female 1 1 1991 160 60.0 Right
st25 Female 17 8 1998 163 60.0 Right
st35 Female 1 9 1998 174 60.0 Right
st49 Female 2 5 1999 165 63.0 Left
st46 Male 6 11 1998 163 64.0 Right
st20 Female 30 6 1997 158 65.0 Right
st29 Male 28 7 1998 185 65.0 Left
st45 Male 6 12 1997 166 65.0 Right
st1 Male 1 2 1993 179 67.0 Right
st30 Male 5 1 1997 178 67.0 Right
st18 Female 16 11 1998 163 68.0 Right
st19 Female 3 5 1998 162 68.0 Right
st31 Male 27 11 1997 180 68.0 Right
st24 Female 10 6 1998 159 69.0 Right
st12 Male 9 2 1997 183 70.0 Right
st44 Female 19 9 1997 174 72.0 Right
st32 Male 29 8 1998 170 74.0 Right
st8 Female 14 1 1997 162 75.0 Left
st39 Male 19 11 1998 175 75.0 Right
st50 Male 31 10 1998 184 75.0 Right
st21 Male 15 1 2018 175 76.0 Right
st6 Male 26 9 1996 175 77.0 Right
st51 Male 9 3 1996 177 77.0 Right
st15 Male 13 10 1998 182 78.0 Right
st9 Male 1 5 1997 173 80.0 Right
st41 Male 18 5 1997 181 81.0 Right
st48 Male 14 3 1993 195 85.0 Right
st4 Male 11 8 1992 180 94.0 Right
st10 Male 25 6 1997 188 105.0 Right
st17 Male 10 1 1998 175 106.0 Right
hand_span_cm
st22 20
st28 20
st14 15
st36 30
st27 25
st40 20
st42 18
st47 20
st16 20
st23 20
st33 14
st34 22
st37 8
st43 12
st3 18
st7 21
st13 19
st2 18
st26 19
st38 19
st11 24
st5 19
st25 15
st35 24
st49 17
st46 15
st20 8
st29 22
st45 15
st1 15
st30 24
st18 13
st19 13
st31 19
st24 18
st12 20
st44 16
st32 25
st8 18
st39 20
st50 22
st21 20
st6 18
st51 23
st15 21
st9 16
st41 20
st48 30
st4 25
st10 20
st17 15
()library()install.packages()knitr: a package for RmarkdownKnitr is the system that merges R code and Markdown to produce documents that depend on data
It has many functions. We used two of them:
knitr::kable() is a function to produce nicer tables
pander() from the pander packageknitr::opts_chunk$set() to set the default options for each chunkkable()survey[1:5,]
Gender birth_day birth_month birth_year height_cm weight_kg handness
st1 Male 1 2 1993 179 67 Right
st2 Female 9 10 1995 167 58 Right
st3 Female 28 1 1995 170 56 Left
st4 Male 11 8 1992 180 94 Right
st5 Female 1 1 1991 160 60 Right
hand_span_cm
st1 15
st2 18
st3 18
st4 25
st5 19
kable()knitr::kable(survey[1:5,])
| Gender | birth_day | birth_month | birth_year | height_cm | weight_kg | handness | hand_span_cm | |
|---|---|---|---|---|---|---|---|---|
| st1 | Male | 1 | 2 | 1993 | 179 | 67 | Right | 15 |
| st2 | Female | 9 | 10 | 1995 | 167 | 58 | Right | 18 |
| st3 | Female | 28 | 1 | 1995 | 170 | 56 | Left | 18 |
| st4 | Male | 11 | 8 | 1992 | 180 | 94 | Right | 25 |
| st5 | Female | 1 | 1 | 1991 | 160 | 60 | Right | 19 |
So far all the files we have used is structured
That is, they have rows and columns
We use read.table and write.table to read and write a data frame
Sometimes the data is not a table
people <- list(Ali=list(age=18, sex='M'), Bahar=list(age=19, sex='F'), valid=c(TRUE,FALSE)) people
$Ali $Ali$age [1] 18 $Ali$sex [1] "M" $Bahar $Bahar$age [1] 19 $Bahar$sex [1] "F" $valid [1] TRUE FALSE
How can we read and write lists?
There are several options to store lists into files.
A good one is YAML, which looks like this:
Ali: age: 18.0 sex: M Bahar: age: 19.0 sex: F valid: - yes - no
:---- before and after the YAML codeGoogle “YAML” for more info
We use YAML for the Rmarkdown metadata. For example
--- title: "Midterm Exam" subtitle: "Computing in Molecular Biology 1" author: "Put your name here" number: STUDENT_NUMBER date: "October 25, 2018" output: html_document ---
library(yaml)
write_yaml(people, "datafile.yml")
persons <- read_yaml("datafile.yml")
persons
$Ali $Ali$age [1] 18 $Ali$sex [1] "M" $Bahar $Bahar$age [1] 19 $Bahar$sex [1] "F" $valid [1] TRUE FALSE
references:
- type: article-journal
id: WatsonCrick1953
title: 'Molecular structure of nucleic acids: a structure for
deoxyribose nucleic acid'
author:
- family: Watson
given: J. D.
- family: Crick
given: F. H. C.
container-title: Nature
volume: 171
issue: 4356
page: 737-738
issued:
date-parts:
- - 1953
- 4
- 25
Put all the references somewhere in the document, with --- before and after.
[@WatsonCrick1953] produces (Watson and Crick 1953)[@WatsonCrick1953, pp. 33-35, 38-39] becomes (Watson and Crick 1953, 33–35, 38–39).[@WatsonCrick1953; @Collado-Vides2009a] becomes (Watson and Crick 1953; Collado-Vides et al. 2009).@WatsonCrick1953 [p. 33] says blah becomes Watson and Crick (1953, 33) says blahIf you have a long list of all papers, and you use it on several documents, then you should put the references in a separate file
Then you write
bibliography: references.yml
in the document metadata
| Format | File extension |
|---|---|
| BibLaTeX | .bib |
| BibTeX | .bibtex |
| Copac | .copac |
| CSL JSON | .json |
| CSL YAML | .yaml |
| EndNote | .enl |
| EndNote XML | .xml |
| ISI | .wos |
| MEDLINE | .medline |
| MODS | .mods |
| RIS | .ris |
It is good that RMarkdown uses all these formats
There are many tools to manage your paper collection
It is not enough to download PDF and store them in a folder. They need to be organized and have a structure
Two good and free programs are Mendeley and Zotero
Bibliographies will be placed at the end of the document. Normally, you will want to end your document like this:
last paragraph... # References
The bibliography will be inserted after this header. More info at
http://rmarkdown.rstudio.com/ authoring_bibliographies_and_citations.html
Collado-Vides, J, H Salgado, E Morett, S Gama-Castro, V Jiménez-Jacinto, I Martínez-Flores, A Medina-Rivera, L Muñiz-Rascado, M Peralta-Gil, and A Santos-Zavaleta. 2009. “Bioinformatics Resources for the Study of Gene Regulation in Bacteria.” Journal of Bacteriology 191 (1): 23–31.
Watson, J. D., and F. H. C. Crick. 1953. “Molecular Structure of Nucleic Acids: A Structure for Deoxyribose Nucleic Acid.” Nature 171 (4356): 737–38. https://doi.org/10.1038/171737a0.