Class 19: Practice, practice, practice

Computing in Molecular Biology and Genetics 1

Andrés Aravena, PhD

30 November 2020

Right now you should be able to

  • Write a structured document in Markdown
  • Combine R and Markdown
  • Use vectors as indices of vectors
  • Read data from text files
    • Using readr
  • Select single columns as vectors
  • Install new R packages
  • Use filter(), select(), and %>%

About Homework 4

The solution is like this

# many lines like this
students$birthplace[ valid_value &
    first_3_letters=="ADA"] <- "Adana/Turkey"

# then the special cases
students$birthplace[valid_value &
    students$birthplace=="Turkmenistan"] <- "-/Turkmenistan"
students$birthplace[valid_value &
    students$birthplace=="Turkey"] <- "-/Turkey"

Not-so-good solutions

Some people forgot to use valid_value and first_3_letters

students$birthplace[!is.na(students$birthplace) &
 substr(students$birthplace, start = 1, stop=3)=="MUG"]<-"MUGLA/TURKEY"

These variables are there to help you.

Moreover, the new names should not be ALL CAPS

Unexpected problem

Some students found that

students$birthplace[ valid_value &
   first_3_letters =="İST"] <- "Istanbul/Turkey"

did not work.

This happens only in Microsoft Windows® with non-english symbols

Why does it happen?

Non-english letters can be encoded in different ways

There used to be several alternatives

Today there is a universal standard, called Unicode or UTF-8

All professional systems use UTF-8

Microsoft Windows® still uses the old standard, but they are changing it.

Unicode is a general idea. UTF-8 is an implementation of it.

More details at “UTF-8 Support in Windows”

What will we do?

The combination of “Windows® + R + Non-English” is bad

Choose two of three

I made a clean version for us. Let’s use this file:

http://www.dry-lab.org/static/2020/ cmb1/students2018-2020-tidy.tsv

Today’s Goal

Answer some interesting questions

What interesting questions can be answered using the “student survey” data?

  • I don’t know
  • weight, mean, range, min, max, handiness, sex, dim.
  • english level, birthplace, birthdate, sex, height, weight, handiness, hand spam
  • You can have so many different stories by making different kind of tables. It depends on the features that table contains.
  • This “student survey data” is quite complicated for me. I have to repeat last week’s lesson to understand.

What interesting questions can be answered using the “student survey” data?

  • Some information about data for example student information etc.
  • sociocultural situations of students
  • Students born before the year 1990
  • How many of students were not born in Istanbul?
  • right-handers live outside of Istanbul left-handers at living Istanbul

What interesting questions can be answered using the “student survey” data?

  • We can sort the answers by years and observe the changing so we can see the results and generations together so we can observe changing in trends
  • students’ body mass index.
  • Relation between the effect of the city students live in on their English level.
  • left-handed students at living Istanbul
  • Do left handed people tend to learn english than the right ones?

What interesting questions can be answered using the “student survey” data?

  • What gender does each student have?
  • How old are all students?
  • where exactly does all students come from?
  • What do all students English levels are?
  • Right-handed person/people from Istanbul with the specific birthday

What interesting questions can be answered using the “student survey” data?

  • How many students names begin with “zey”?
  • Where is the birthplace of the students whose names begin with “ze”?
  • What are the last letters names of the students whose birthplace is Istanbul?
  • We can find out if there are students who were born in Istanbul and also takes the course a second time.
  • Or, we can find out if there are students who speak fluently English which is also female.

What interesting questions can be answered using the “student survey” data?

  • What is the average height of female students living in Istanbul?
  • What is the average height of male students living in Istanbul?
  • What is the average weight of female students living in Istanbul?
  • What is the average weight of male students living in Istanbul?

Which ones are interesting questions?

Which ones can be answered with our data?

Answer in the Quiz

Let’s answer some questions

We load the data

library(readr)
students <- read_tsv("students2018-2020-tidy.tsv")
students
# A tibble: 117 x 10
   answer_date id    english_level sex   birthdate  birthplace height_cm
   <date>      <chr> <chr>         <chr> <date>     <chr>          <dbl>
 1 2018-09-17  3e50… I can speak … Male  1993-02-01 -/Turkey         179
 2 2018-09-17  479d… I can unders… Fema… 1998-05-21 Kahramanm…       168
 3 2018-09-17  39df… I can read a… Fema… 1998-01-18 Batman/Tu…        NA
 4 2018-09-17  d2b0… I can read a… Male  1998-08-29 Antalya/T…       170
 5 2018-09-17  f22b… I can read a… Fema… 1998-05-03 Izmir/Tur…       162
 6 2018-09-17  849c… İngilizce bi… Fema… 1995-10-09 Yalova/Tu…       167
 7 2018-09-17  8381… I can speak … Fema… 1997-09-19 Adıyaman/…       174
 8 2018-09-17  b0dd… I can read a… Male  1997-11-27 Bursa/Tur…       180
 9 2018-09-17  2972… I can read a… Fema… 1999-01-02 Istanbul/…       162
10 2018-09-17  72c0… I can read a… Fema… 1998-10-02 Istanbul/…       172
# … with 107 more rows, and 3 more variables: weight_kg <dbl>,
#   handedness <chr>, hand_span <dbl>

These our tools

From dplyr package

  • filter(): choose rows
  • select(): choose columns
  • arrange(): sort
  • mutate(): change or add columns
  • summarize(): calculate on all rows
  • group_by(): separate in many tibbles

Combine different tools with pipe %>%

Extra tools

  • distinct(): eliminate duplicates
  • slice_head(): a modern version of head()
  • slice_min(): keep only the “best” rows
  • Inside summarize():
    • n(): count
    • n_distinct(): count without repetitions

From kintr package

  • kable(): prints nicer tables in the document

Who are the students born before 1999?

Sort the result by birth date

How many of students were not born in Istanbul?

Averages

  • What is the average height?
  • What is the average height of female students?
  • What is the average height of male students?
  • What is the average height of female students living in Istanbul?
  • What is the average height of male students living in Istanbul?

Same questions with weight

Body mass index

Sort the table by body mass index \[BMI=\frac{Weight}{Height^2}\]

Show the top three and bottom three