November 7th, 2016

Welcome back

to “Computing for Molecular Biology 1”

Digital Signature

a parenthesis

Do I have your file?

At the end of the exam you sent me a file

How can we verify that we have the same file?

How can we be sure that nobody changed it?

How to be sure without showing the content of the file?

Digital signature

An answer to these question is given by digital signatures

They are not digital pictures of a handwritten signature

Instead they are a unique number that identifies the exact document

This number is called digest. It is produced by a crypotgraphic hash function

Criptographic hash function

MD5 hash function

  • The input is a file (all the characters)
  • The output is the digest
  • The same input produces always the same digest
  • Different inputs produce different digests
  • If the input changes, the digest changes
  • If the input changes a little, the digest changes a lot

How do you validate the file?

Go to http://onlinemd5.com/ or any other service you find on Google

The evaluation is done in your computer. The file is not sent by the internet

You can take the file you attached, get the digest and compare with the one I created

If they are the same we are sure that I have your file

And we do not need to show the content

Application: Intellectual Property

Imagine you are working in a project

  • You have an idea, a draft or some data that is confidential
  • You do not want to make it public (yet)
  • But you want to show that you have this document today

You can get the MD5 digest and publish it

  • on a newspaper
  • on Facebook
  • or anywhere you can look back and show the date and the digest

Functions

end of parenthesis

Functions in R

We have seen that R has several data types

  • vectors
  • matrices
  • lists
  • data frames

There are many others. For example functions

Functions

In Math and Informatics, a function is a “black box”

A rule to transform the input elements into an output

  • e.g. logarithm of a number, length of a vector

The same input should produce always the same output

Notice that there may be more than one input element

Syntax

Functions have one name and several inputs

Inputs are always inside parenthesis ( )

Some inputs can be optional. They have default values

list.files(path = ".", pattern = NULL, all.files = FALSE,
           full.names = FALSE, recursive = FALSE,
           ignore.case = FALSE, include.dirs = FALSE, no.. = FALSE)

Here all inputs are optionals. The default value is shown

Getting help

help(topic, package = NULL, lib.loc = NULL,
     verbose = getOption("verbose"),
     try.all.packages = getOption("help.try.all.packages"),
     help_type = getOption("help_type"))

Here topic is a mandatory input. The rest are optional

Really getting help

help(function_name)

or in short version

?function_name

Some examples

dir()
getwd()
setwd(dir)
c(...)
factor(...)
list(...)
data.frame(...)

Telling stories

Descriptive Statistics

We have data, we want to tell something about them

How can we summarize all the values in a few numbers?

Let’s use the vector rivers.

Standard Data Descriptors

  • Number of elements
  • Location
  • Dispersion

Counting

  • length(rivers)
  • nrow(state.x77)
  • dim(state.x77)
  • table(state.division)

Counting

length(rivers)
[1] 141
nrow(state.x77)
[1] 50
dim(state.x77)
[1] 50  8
table(state.region)
state.region
    Northeast         South North Central          West 
            9            16            12            13 

Location

If you have to describe the vector v with a single number x, which would it be?

If we have to replace each one of v[i] for a single number, which number is “the best”?

Better choose one that is the “less wrong”

How can x be wrong?