September 24th, 2018

Name Storage capacity Memory size disk size
Şeyma Asyalı 465 GB 3.9 GB 449 GB
Nursima Mutlu 146 GB 8 GB 151 GB
Sevda Aydın 200 GB 8 GB 62 GB
Aslıhan Gizem Bilgin 8 GB 1 TB
Ceren Çolak 119 GB 4 GB 62 GB
Bahadır Kasap 465 GB 3.9 GB 449 GB

• The first question was a rhetorical question. You do not answer rhetorical questions
• 3.9GB is probably 4GB rounded down. Nobody makes 3.9GB memory chips
• Do not confuse Disk Size with Available Disk space. A 20L water bottle is still 20L even when you drink half the water.
• Send all your homework to andres.aravena+cmb@istanbul.edu.tr

## Analyzing Data

### for fun and profit

Many disciplines, including Molecular Biology and Genetics, have become more and more data driven.

Starting now, we will use RStudio, a free software for data analysis

Most users of R are molecular biologists, but it is also used by economists, psychologists and marketing specialists

## How to use RStudio

You have to install R and RStudio in your computer

You have to execute RStudio. Then you will see a screen like this

## Today we will focus only on one part

Click on File → New File → R Markdown

## A text editor

You will get a new window with an example text

It is a text file. One character takes one byte

Colors are only a guide for you. They are not part of the text

Today we will learn how to write text files for our course

## Structure in Data

Today we will focus on a key idea.

To understand the data we need structure

For example, folders in the disk are a hierarchical structure.

## Structured documents

Text documents also have a logical structure

• Letters form words
• Several words become phrases and paragraphs
• Paragraphs are contained in sections and chapters
• Sometimes we have lists of elements
• Sometimes we have tabular data
• Figures
• References to other works

## The problem

Ordinary word processors are based on the WYSIWYG (What You See Is What You Get) philosophy

Users are encouraged to change fonts, sizes, colors and other visual attributes

## Separation of form and content

Writing and formatting at the same time is distracting.

The idea is to write first, and format later, as close as possible to the time of publication.

• WYSIWYG: What You See Is What You Get
• Microsoft Word
• WYMIWYG: What You Mean Is What You Get
• The information you enter defines the meaning of the document
• The program generates beautiful output

## Markdown

An alternative to ordinary Word Processors is to use text files with a few rules to mark the role of each element.

Text files can be read with any computer, and will be accessible for ever.

Today the Structured Text format most often used is Markdown

Here we show some of the rules

## Paragraphs

Consecutive lines of text are one paragraph. They are separated by an empty line

The first paragraph.

Another paragraph

The first paragraph.

Another paragraph

First level header
==================

-------------------

Normal text

Normal text

### alternative format

# Header 1
#### Header 4

## Unordered Lists

+ Item 1
+ Item 2
+ Item 2a
+ Item 2b
• Item 1
• Item 2
• Item 2a
• Item 2b

Sub-lists are indented by 4 spaces

## Ordered Lists

1. Item 1
1. Item 2
1. Item 3
1. Item 3a
1. Item 3b
1. Item 1
2. Item 2
3. Item 3
1. Item 3a
2. Item 3b

## Important paragraph

### Quotation

To show something remarkable, for example when someone important said something interesting.

> "The limits of my language mean
> the limits of my world"
>
> *Ludwig Wittgenstein*

“The limits of my language mean the limits of my world”

Ludwig Wittgenstein

## Images

You have to indicate the web address of the image

![optional text](http://example.com/logo.png)

or the name of a file in the same directory

![optional text](images/logo.png)

## Tables

|        | sample   | dose | time   | agent            |
|--------|----------|------|--------|------------------|
| 1      | GSM91440 | low  | 5 min  | caffeine         |
| 2      | GSM91893 | low  | 5 min  | caffeine         |
| 3      | GSM91428 | low  | 5 min  | calcofluor white |
| 4      | GSM91881 | low  | 5 min  | calcofluor white |
sample dose time agent
1 GSM91440 low 5 min caffeine
2 GSM91893 low 5 min caffeine
3 GSM91428 low 5 min calcofluor white
4 GSM91881 low 5 min calcofluor white

## Computer code

Programs are usually written in a monospaced font.
That is, all letters have the same width.


this <- is.computer(code)


this <- is.computer(code)

This will be very important in the rest of the course

## At the beginning of the file

---
title: "Title"
author: "Author's name"
date: "4 October 2016"
output: html_document
---

Notice that the block is wrapped by --- (three hyphens)

## Emphasis

Inside the paragraph we can have *italics*
and **bold** text

Inside the paragraph we can have italics and bold text

## Inline code

We can speak about x and data

We can speak about x and data

## How to solve it

### by G. Polya

• You have to understand the problem.
• Find the connection between the data and the question. You should obtain a plan of the solution.