November 26, 2019

## Sorting a vector

We have our own data

survey <- read.table("survey1-tidy.txt")
survey$weight  [1] 67.0 58.0 56.0 94.0 60.0 77.0 56.0 75.0 [9] 80.0 105.0 59.0 70.0 57.0 50.0 78.0 55.0 [17] 106.0 68.0 68.0 65.0 76.0 42.5 55.0 69.0 [25] 60.0 58.0 52.0 47.0 65.0 67.0 68.0 74.0 [33] 55.0 55.0 60.0 50.0 55.0 58.0 75.0 53.0 [41] 81.0 54.0 55.0 72.0 65.0 64.0 54.0 85.0 [49] 63.0 75.0 77.0 ## How can we sort weight? sort(survey$weight)
 [1]  42.5  47.0  50.0  50.0  52.0  53.0  54.0  54.0
[9]  55.0  55.0  55.0  55.0  55.0  55.0  56.0  56.0
[17]  57.0  58.0  58.0  58.0  59.0  60.0  60.0  60.0
[25]  63.0  64.0  65.0  65.0  65.0  67.0  67.0  68.0
[33]  68.0  68.0  69.0  70.0  72.0  74.0  75.0  75.0
[41]  75.0  76.0  77.0  77.0  78.0  80.0  81.0  85.0
[49]  94.0 105.0 106.0

sort(survey$weight, decreasing=TRUE)  [1] 106.0 105.0 94.0 85.0 81.0 80.0 78.0 77.0 [9] 77.0 76.0 75.0 75.0 75.0 74.0 72.0 70.0 [17] 69.0 68.0 68.0 68.0 67.0 67.0 65.0 65.0 [25] 65.0 64.0 63.0 60.0 60.0 60.0 59.0 58.0 [33] 58.0 58.0 57.0 56.0 56.0 55.0 55.0 55.0 [41] 55.0 55.0 55.0 54.0 54.0 53.0 52.0 50.0 [49] 50.0 47.0 42.5 ## How to sort the complete data frame? The command sort() works only for vectors To sort a data frame, we first need to choose which column we use to order We know the position of the smallest and the largest ## Position of the smallest and largest which.min(survey$weight)
[1] 22
which.max(survey$weight) [1] 17 We need the positions in between For that we use the order() command ## Let’s verify if this is correct survey$weight
 [1]  67.0  58.0  56.0  94.0  60.0  77.0  56.0  75.0
[9]  80.0 105.0  59.0  70.0  57.0  50.0  78.0  55.0
[17] 106.0  68.0  68.0  65.0  76.0  42.5  55.0  69.0
[25]  60.0  58.0  52.0  47.0  65.0  67.0  68.0  74.0
[33]  55.0  55.0  60.0  50.0  55.0  58.0  75.0  53.0
[41]  81.0  54.0  55.0  72.0  65.0  64.0  54.0  85.0
[49]  63.0  75.0  77.0

order(survey$weight)  [1] 22 28 14 36 27 40 42 47 16 23 33 34 37 43 3 7 13 [18] 2 26 38 11 5 25 35 49 46 20 29 45 1 30 18 19 31 [35] 24 12 44 32 8 39 50 21 6 51 15 9 41 48 4 10 17 This gives us the position of the smallest, the second smallest, and so on up to the largest ## Then we do survey[order(survey$weight),]

     Gender birth_day birth_month birth_year height_cm weight_kg handness
st22 Female        13          10       1997       155      42.5    Right
st28 Female         7           7       1997       166      47.0    Right
st14 Female         3           7       1997       160      50.0    Right
st36 Female        24           3       1998       167      50.0    Right
st27 Female        13          10       1997       171      52.0    Right
st40 Female         5           2       1998       157      53.0    Right
st42 Female        18           5       1997       165      54.0    Right
st47 Female        29           7       1997       160      54.0    Right
st16 Female         3           9       2018       164      55.0    Right
st23 Female         2          10       1998       172      55.0    Right
st33 Female        21           5       1998       168      55.0    Right
st34 Female         3           9       1998       174      55.0    Right
st37 Female        17           9       1998       173      55.0    Right
st43 Female        23           5       1999       178      55.0    Right
st3  Female        28           1       1995       170      56.0     Left
st7  Female         5           4       1996       173      56.0    Right
st13 Female         9           6       1998       158      57.0    Right
st2  Female         9          10       1995       167      58.0    Right
st26 Female        17           5       1998       165      58.0    Right
st38 Female         2           1       1999       162      58.0    Right
st11   Male        26          12       1997       176      59.0    Right
st5  Female         1           1       1991       160      60.0    Right
st25 Female        17           8       1998       163      60.0    Right
st35 Female         1           9       1998       174      60.0    Right
st49 Female         2           5       1999       165      63.0     Left
st46   Male         6          11       1998       163      64.0    Right
st20 Female        30           6       1997       158      65.0    Right
st29   Male        28           7       1998       185      65.0     Left
st45   Male         6          12       1997       166      65.0    Right
st1    Male         1           2       1993       179      67.0    Right
st30   Male         5           1       1997       178      67.0    Right
st18 Female        16          11       1998       163      68.0    Right
st19 Female         3           5       1998       162      68.0    Right
st31   Male        27          11       1997       180      68.0    Right
st24 Female        10           6       1998       159      69.0    Right
st12   Male         9           2       1997       183      70.0    Right
st44 Female        19           9       1997       174      72.0    Right
st32   Male        29           8       1998       170      74.0    Right
st8  Female        14           1       1997       162      75.0     Left
st39   Male        19          11       1998       175      75.0    Right
st50   Male        31          10       1998       184      75.0    Right
st21   Male        15           1       2018       175      76.0    Right
st6    Male        26           9       1996       175      77.0    Right
st51   Male         9           3       1996       177      77.0    Right
st15   Male        13          10       1998       182      78.0    Right
st9    Male         1           5       1997       173      80.0    Right
st41   Male        18           5       1997       181      81.0    Right
st48   Male        14           3       1993       195      85.0    Right
st4    Male        11           8       1992       180      94.0    Right
st10   Male        25           6       1997       188     105.0    Right
st17   Male        10           1       1998       175     106.0    Right
hand_span_cm
st22           20
st28           20
st14           15
st36           30
st27           25
st40           20
st42           18
st47           20
st16           20
st23           20
st33           14
st34           22
st37            8
st43           12
st3            18
st7            21
st13           19
st2            18
st26           19
st38           19
st11           24
st5            19
st25           15
st35           24
st49           17
st46           15
st20            8
st29           22
st45           15
st1            15
st30           24
st18           13
st19           13
st31           19
st24           18
st12           20
st44           16
st32           25
st8            18
st39           20
st50           22
st21           20
st6            18
st51           23
st15           21
st9            16
st41           20
st48           30
st4            25
st10           20
st17           15

## Packages and Libraries

• All interesting things in R are done using functions
• We recognize them because they use ()
• Several functions of the same subject are grouped in a package
• To use functions from a package we need to load them using library()
• If the package is not in your computer, you need to use install.packages()

## knitr: a package for Rmarkdown

Knitr is the system that merges R code and Markdown to produce documents that depend on data

It has many functions. We used two of them:

• knitr::kable() is a function to produce nicer tables
• The mandatory input is a data.frame
• It is similar to the function pander() from the pander package
• knitr::opts_chunk$set() to set the default options for each chunk ## Without kable() survey[1:5,]  Gender birth_day birth_month birth_year height_cm weight_kg handness st1 Male 1 2 1993 179 67 Right st2 Female 9 10 1995 167 58 Right st3 Female 28 1 1995 170 56 Left st4 Male 11 8 1992 180 94 Right st5 Female 1 1 1991 160 60 Right hand_span_cm st1 15 st2 18 st3 18 st4 25 st5 19 ## With kable() library(knitr) kable(survey[1:5,]) Gender birth_day birth_month birth_year height_cm weight_kg handness hand_span_cm st1 Male 1 2 1993 179 67 Right 15 st2 Female 9 10 1995 167 58 Right 18 st3 Female 28 1 1995 170 56 Left 18 st4 Male 11 8 1992 180 94 Right 25 st5 Female 1 1 1991 160 60 Right 19 ## Some hints • Use library(knitr) before using any function of the package • Remember that the RMarkdown document is independent of Console • Save your document on the X: drive (when using lab computers) ## Data frame shapes ## Wide data has a column for each variable Using the data from the exam world  income population area 1 1810 31700000 653000 2 10500 2920000 28800 3 13300 38300000 2380000 4 6190 26000000 1250000 5 18900 97800 440 6 19500 42500000 2780000 ## Long Data has one value column  variable value 1 income 1810 2 income 10500 3 income 13300 4 income 6190 5 income 18900 6 income 19500 7 population 31700000 8 population 2920000 9 population 38300000 10 population 26000000 11 population 97800 12 population 42500000 13 area 653000 14 area 28800 15 area 2380000 16 area 1250000 17 area 440 18 area 2780000 ## Changing the shape We use the reshape2 library • melt takes wide-format data and melts it into long-format data. • cast takes long-format data and casts it into wide-format data. Think of working with metal: • if you melt metal, it drips and becomes long • if you cast it into a mould, it becomes wide ## Melting library(reshape2) melt(world, id=NULL)  variable value 1 income 1810 2 income 10500 3 income 13300 4 income 6190 5 income 18900 6 income 19500 7 population 31700000 8 population 2920000 9 population 38300000 10 population 26000000 11 population 97800 12 population 42500000 13 area 653000 14 area 28800 15 area 2380000 16 area 1250000 17 area 440 18 area 2780000 ## Melting with text columns Consider this case countries  country income population area 1 Afghanistan 1810 31700000 653000 2 Albania 10500 2920000 28800 3 Algeria 13300 38300000 2380000 4 Angola 6190 26000000 1250000 5 Antigua and Barbuda 18900 97800 440 6 Argentina 19500 42500000 2780000 ## The country is the identifier library(reshape2) melt(countries, id="country")  country variable value 1 Afghanistan income 1810 2 Albania income 10500 3 Algeria income 13300 4 Angola income 6190 5 Antigua and Barbuda income 18900 6 Argentina income 19500 7 Afghanistan population 31700000 8 Albania population 2920000 9 Algeria population 38300000 10 Angola population 26000000 11 Antigua and Barbuda population 97800 12 Argentina population 42500000 13 Afghanistan area 653000 14 Albania area 28800 15 Algeria area 2380000 16 Angola area 1250000 17 Antigua and Barbuda area 440 18 Argentina area 2780000 ## Long- to wide-format • going from wide- to long-format data is easy • going from long- to wide-format data needs more care • reshape2 has several cast functions, for different structures • For data frames we use dcast • There is also acast for vector, matrix, or array • but we will not use it in this course ## We start with long format long <- melt(countries, id="country") long  country variable value 1 Afghanistan income 1810 2 Albania income 10500 3 Algeria income 13300 4 Angola income 6190 5 Antigua and Barbuda income 18900 6 Argentina income 19500 7 Afghanistan population 31700000 8 Albania population 2920000 9 Algeria population 38300000 10 Angola population 26000000 11 Antigua and Barbuda population 97800 12 Argentina population 42500000 13 Afghanistan area 653000 14 Albania area 28800 15 Algeria area 2380000 16 Angola area 1250000 17 Antigua and Barbuda area 440 18 Argentina area 2780000 ## Now we transform it dcast(long, country~variable)  country income population area 1 Afghanistan 1810 31700000 653000 2 Albania 10500 2920000 28800 3 Algeria 13300 38300000 2380000 4 Angola 6190 26000000 1250000 5 Antigua and Barbuda 18900 97800 440 6 Argentina 19500 42500000 2780000 ## Not all data is a data frame ## When data has no structure So far all the files we have used is structured That is, they have rows and columns We use read.table and write.table to read and write a data frame Sometimes the data is not a table ## Example people <- list(Ali=list(age=18, sex='M'), Bahar=list(age=19, sex='F'), valid=c(TRUE,FALSE)) people $Ali
$Ali$age
[1] 18

$Ali$sex
[1] "M"

$Bahar$Bahar$age [1] 19$Bahar$sex [1] "F"$valid
[1]  TRUE FALSE

How can we read and write lists?

## YAML: format for lists

There are several options to store lists into files.
A good one is YAML, which looks like this:

Ali:
age: 18.0
sex: M
Bahar:
age: 19.0
sex: F
valid:
- yes
- no

## Rules for YAML files

• Each list element starts in the first column. No spaces
• The inner list elements are indented with 2 spaces
• You can have lists inside lists inside lists…
• Name and values are separated by :
• Vector elements are marked with -
• When used inside Rmarkdown, put --- before and after the YAML code

## You have seen YAML before

We use YAML for the Rmarkdown metadata. For example

---
title: "Midterm Exam"
subtitle: "Computing in Molecular Biology 1"
number: STUDENT_NUMBER
date: "October 25, 2018"
output: html_document
---
ALWAYS write your name and student number

## Reading and writing YAML in R

library(yaml)
write_yaml(people, "datafile.yml")
persons
$Ali$Ali$age [1] 18$Ali$sex [1] "M"$Bahar
$Bahar$age
[1] 19

$Bahar$sex
[1] "F"

\$valid
[1]  TRUE FALSE

## Use YAML for bibliography

references:
- type: article-journal
id: WatsonCrick1953
title: 'Molecular structure of nucleic acids: a structure for
deoxyribose nucleic acid'
author:
- family: Watson
given: J. D.
- family: Crick
given: F. H. C.
container-title: Nature
volume: 171
issue: 4356
page: 737-738
issued:
date-parts:
- - 1953
- 4
- 25

## How to use it

Put all the references somewhere in the document, with --- before and after.

• [@WatsonCrick1953] produces (Watson and Crick 1953)
• [@WatsonCrick1953, pp. 33-35, 38-39] becomes (Watson and Crick 1953, 33–35, 38–39).
• [@WatsonCrick1953; @Collado-Vides2009a] becomes (Watson and Crick 1953; Collado-Vides et al. 2009).
• @WatsonCrick1953 [p. 33] says blah becomes Watson and Crick (1953, 33) says blah

## External bibliographies

If you have a long list of all papers, and you use it on several documents, then you should put the references in a separate file

Then you write

bibliography: references.yml

## Bibliography at the end of document

Bibliographies will be placed at the end of the document. Normally, you will want to end your document like this:

last paragraph...

# References